Teradata Overview

Teradata

An Overview

Access patterns are different, and hence…

The access patterns of these two approaches are very different and hence they make very different demands on the underlying database engineThe basic database architecture has to be different to be optimized for one type of processingTeradata – leader in DSS and Data warehouse space

What is Teradata…

Teradata is a Relational Database Management System (RDBMS) - composed of hardware and software

Designed for world’s largest commercial databases. Used by Customer who are looking out for answers to their business questions from data of over 1 Terabyte6 of the top 10 Retailers6 of the top 9 Communications companiesOver 40% of the leading Manufacturers in the world3 of the top 4 Blue Cross/Blue Shield insurance companiesMany of the world's leading Banks

Teradata – a brief history1979 - Teradata Corp founded in Los Angeles, California. Development begins on a massively parallel database computer1984 - Teradata sells first DBC/10121986 - Product of the Year1990 - First Terabyte system installed and in production1992 - Teradata is merged into NCR1995 - Teradata Version 2 for UNIX operating systems released

Why Teradata…

Capacity:Scaling from Gigabytes to Terabytes of detailed data stored in billions of rowsScaling to thousands of millions of instructions per second (MIPS) to process data

Performance:Shared Nothing Architecture - able to achieve parallelism in each and every stage of query executionMakes Teradata Database faster than other relational systems

Single Data Store:Can be accessed by network-attached and channel-attached systemsSupports the requirements of many diverse clients

Fault Tolerance & Availability:High fault tolerance, no single point failureAutomatically detects and recovers from hardware failures

Data Integrity:Ensures that transactions either complete or rollback to a stable state if a fault occurs

Scalability:Linearly expandable - as your database grows, additional nodes may be addedAllows expansion without sacrificing performance

Teradata Architecture, the SMP…

AMPs

PEs

Vprocs

CPU (Processors)

Vdisks

Node

•Virtual Processor is a set of software processes running on a node. Each Vproc is a separate, independent copy of the processor software isolated from the other vprocs but sharing some of the physical resources of the node such as memory and CPUs.

•AMPs: Access Module Processors

•Storing and retrieving rows to and from the disks

•Lock management

•Sorting rows and Aggregating columns

•Join processing

•Output conversion and formatting

•Creating answer sets for clients

•Disk space management and Accounting

•Special utility protocols

•Recovery processing

Parsing Engine:

•Checks the SQL Syntax

•Resource Availability and Rights

•Parses the SQL

•Generates AMP Steps

•Creates plan

•Dispatches to the AMPs over BYNET

•EBCDIC-ASCII Conversion

•Handle up to 120 User Sessions

This is called SMP – Symmetric Multiprocessor - A multiprocessing node that contains a number of central processing units sharing a single memory pool"Shared Nothing Architecture" - each AMP has its own disk (data) and it shares this with no other AMP and solely responsible for any changes/access to that data

And then comes the MPP…

BYNET

BYNET: Dual redundant, fault-tolerant bi-directional interconnect network that enables:

•Automatic load balancing of message traffic

•Automatic reconfiguration after fault detection

•Scalable bandwidth as nodes are added

The BYNET is responsible for:

•Broadcast, multicast, and point-to-point communications between nodes and virtual processors

•Merging answer sets back to the PE

•Making Teradata parallelism possible

MPP (Massively Parallel Processing) consists of a number of nodes (SMP’s) that work on a problem at the same timeEach node (SMP) has one or more CPUs, own memory, I/O, network connections and disk arrays and doesn't share its resources with other nodes

Important components…

SMP – Symmetric Multiprocessing is a single node that contains multiple CPUs sharing memory pool.

MPP – SMP combined with a communication network (BYNET) form a MPP. A MPP comprises of two or more loosely coupled SMP nodes connected by the BYNET with shared SCSI access to multiple disk arrays

BYNET – Hardware inter-processor network to link nodes on an MPP

system. It implements point to point, multicast, broadcast communications

depending upon situation. BYNET is usually used for merging and sorting of

data from different nodes. The accumulated data is then sent back to the

User.

Disk Array – Teradata employs RAID storage technology where drives are

configured logically in one or more logical unit (LUN) which is further sliced

into Pdisk that is assigned to each AMP. Group of Pdisk assigned to a AMP

is called Vdisk.

More Definitions…PDE - Parallel Database Extension is an interface layer on top of operating system. It enhances the processing by providing capability of parallel processing and priority scheduling. It executes Vprocs. It take advantage of BYNET and Shared Disk hardware to improve performance. It may visualized as a layer on top of Operating System

File System - Teradata File System service calls allow Teradata RDBMS to store and retrieve data efficiently without being concerned about underlying operating system interfaces. It divides the disk in to logical blocks, MI, CI, CID, DB, DBD

TPA - Teradata Parallel Application is responsible for distribution, coordination and balancing of processes/threads across nodes

TDP - Teradata Director Program is responsible for session balancing across multiple PEs, failure notification, logging, verification, recovery, restart and security

Logical Processors…VPROCS - Virtual Processors. Vprocs are set of software processors that run on a node under Teradata PDE within the multitasking environment of the operating system. A single node (SMP) can have as high as 128 Vprocs

PE - Parsing Engine performs session control and dispatches tasks to fetch, return and merge data. It communicates with the client system on one side and with the AMPs on the other side (via BYNET)

AMP - Access Modular Processor retrieve and update data on the virtual disks. It is accountable for doing locking, joining, sorting, aggregation, data conversion, disk space management, accounting, and journaling

A single PE can handle a request at a time. This request is parsed, optimized, steps are built and then dispatched to corresponding AMP(s)

An AMP has 80 worker task which perform different kind of work related to the steps. If the request is a select, these worker tasks after finishing the work sends data to BYNET where it is merged and sorted

PE dispatches the resultant data to the user

Query Lifecycle…

CLI

TDP (Teradata Director Program)

PE (1)

CLI

PE (2)

BYNET

AMP (1) AMP (2) AMP (3) AMP (4)

V Disk (1) V Disk (2) V Disk (3) V Disk (4)

ID (PI) Desc3 C5 E

ID (PI) Desc1 A4 D

ID (PI) Desc2 B6 F

ID (PI) Desc7 G8 H

Clie

nt

Serv

er

SELECT * FROM t1 WHERE id = 4;

SELECT * FROM t1 WHERE id IN (2,8);

Merge

Data is distributed across all AMPs based on row-hash of PI

Hashmap

• Application sends the request to the PE - PE sends back the acknowledge to application

• The SQL is parsed by the PE

• PE uses the Hashmap to locate the AMP

• PE sends the request to the particular AMP - AMP sends back the acknowledge to PE

• AMP retrieves the data from its own Vdisk

• AMP sends the data to PE

• Result is sent to application from PE - Application sends back acknowledge to PE

• Application sends the request to the PE - PE sends back the acknowledge to application

• The SQL is parsed by the PE

• PE uses the Hashmap to locate the AMPs

• PE sends the request to the individual AMPs - AMP sends back the acknowledge to PE

• AMP retrieves the data from its own Vdisk

• AMP sends data to BYNET

• BYNET merges the data

• BYNET sends merged data to PE

• Result is sent to application from PE - Application sends back acknowledge to PE

Data Distribution and Access Methods…

Hashing: Teradata uses hashing for data distribution & accessData row is hashed based on primary index value. Hash maps direct

the data row to a particular AMP based on its hash value.

PI

Row Hash

HashMap

Hashing and Indexing…

Indexing:A data value (or values, if the index is compound) from a row acts as an index key to that rowAssociates the index key with a relative row address that reports the location of the row on diskStored in order of their index key values and are said to be value-ordered

Hashing:Index key data value is transformed by a mathematical function to produce an abstract value not related to the original data value in an obvious wayHashed data is assigned to hash buckets that correspond in a 1:1 manner to the relationship a particular hash code with an AMP locationThere is no obvious correspondence between a hash code and the location of the row it refers to

Teradata does not use indexing. What we refer to as indexes are either row hash values or data tables (join index)

Tradeoffs Between Hashing and Indexing:Hashing is far better suited for the parallel database architectureHashing provides consistently better performance because rows are always distributed evenly across the AMPsPrimary indexes are not stored in an index subtable - directly as part of the row dataPrimary index columns on frequently used join constraints can be co-located on the same AMPRange queriesRetrievals having selection criteria that involve only part of a multicolumn hash key

Hashing…

Teradata Database hashing algorithms are proprietary mathematical functions that transform an input data value of any length into a 32-bit valueA 32-bit row hash value provides 4.2 billion possible values

First 16 bits - Destination Selection Word - used to define the hash bucket for the hashed rowThe remaining 16 bits are a remainder from the operation of the hash function on the original input valueUniqueness Value - additional 32-bit system-generated Uniqueness Value to ensure the uniqueness of any RowID. Generated at AMP levelThere are 65,536 hash buckets, distributed as evenly as possible among the AMPsThe BYNET interface board on each AMP maintains a hash map - an index of which hash buckets are assigned to which AMPsRow assignment is performed in a manner that ensures as equal a distribution of table rows as possible among all the AMPs

16-bit DestinationSelection Word

16-bit Remainder32-bit

Uniqueness Value

Row Hash

Row ID

Hash-Related Functions…

To predict the distribution on AMP for a chosen PISELECT HASHAMP (HASHBUCKET (HASHROW (empno))) AS amp_no,COUNT(*) FROM employee GROUP BY 1 ORDER BY 2 DESC;amp_no count(*)25 351029 346817 3181To see the selectivity of a PISELECT HASHROW (empno)) AS hash_value, COUNT(*) FROM employee GROUP BY 1 ORDER BY 2DESC;hash_value count(*)63524 148069 144191 1If there are no hash collisions, the result ratio is close to 1SELECT(COUNT (*) (FLOAT))/(COUNT(DISTINCT HASHROW(empno))) FROM employee;

Data Distribution Issues…

Hash CollisionsSituations in which the row hash value for different rows is identical, making it difficult for a system to discriminate among the hash synonyms when one unique row is requested for retrieval from a set of hash synonymsSystems define 4.2 billion hash valuesSystem-generated 32-bit Uniqueness Value to the row hash

Skewing of Hash Bucket DistributionCaused by wrong selection of PI which is having less unique values ItImpacts parallel processing of the data

Data PartitioningFor Join-on columns, a row hash value is recalculated based on new columns involved in the join. If tables are being joined on 3 column (a,b,c), then a row hash value is computed as if (a,b,c) was a PI. If row hash values of the joining columns are not on AMP, then the rows are redistributed across all AMP which is overhead

Teradata Indexes…Indexes are method of storing and retrieving data from Teradata optimallyBy default every table would have one index. It is called Primary Index (PI). In addition, if the user is making use of columns other than PI in a query, then he/she can declare Secondary Index (SI) on that column for faster access of dataTypes of indexes:Primary Index – Unique and Non-Unique, no Subtable, affects data distributionSecondary Index – Unique and Non-Unique, avoids FTS, Subtable, does not affect data distribution, extra overhead of updating Subtable in case insert/delete/update is done on tableJoin Index – Single Table, Multi Table and Aggregate Join IndexSingle Table JI allows hashing of rows based on some other column. This column might be used in condition of SQL qualifying the JI for data accessMulti-Table JI on columns from more than one table avoids recalculating join values in a query which is frequently usedAggregate JI on columns help queries which perform frequent aggregation on same column(s)Hash Index:are file structures that share properties with STJI and SI

Primary Key vs. Primary Index…

Teradata uses Primary Index or Secondary Index to enforce a Primary Key

Primary Key Primary Index

Important component of logical data

model

Not used in logical model

Used to maintain referential integrity Used to distribute and retrieve data

Values can never be changed Values can be changed

Cannot be null Can be null

Does not imply access path Defines the most common access

paths

Not required for physical table

definition

Mandatory for physical table

definition

Primary Index (PI)…

The Teradata Database distributes tables horizontally across all AMPs on a system. The system assigns rows to AMPs based on the value of their primary index. The determination of which hash bucket, and hence which AMP the row is to be stored on, is made solely on the row hash value of its primary index.

Each Teradata Database table must have a primary index.If no explicit definition, a NUPI is created on the 1st column of the table.

Restrictions:Only one PI per tableNot more than 64 columnsCannot include columns having BLOB or CLOB data types

No separate physical storage – stored in-line with the row in the base table

Rows are hash-ordered within the same AMP

Types of Primary Index : A PI can be defined over two orthogonal dimensionsUnique (UPI) or non-unique (NUPI)Partitioned (PPI) or non-partitioned (NPPI)

Types of PI…

Unique Primary IndexNon-unique Primary IndexNon-Partitioned Primary Index

Standard Teradata Database primary indexRows are hashed to the appropriate AMPs and stored there in row hash order

Partitioned Primary Index

Rows are hashed to the appropriate AMPs and then assigned to an appropriate partition based on the value of a partitioning expressionRows are stored in row hash order within the same partitionDesigned to optimize range queries

NPPI & PPI – Data Storage within AMPs…

CREATE MULTISET TABLE orders_1,NO FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL( order_nr VARCHAR(10) NOT NULL, order_cre_dt DATE FORMAT 'YYYY-MM-DD' NOT NULL)UNIQUE PRIMARY INDEX upi_orders_1 (order_nr);

CREATE MULTISET TABLE orders_2,NO FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL( order_nr VARCHAR(10) NOT NULL, order_cre_dt DATE FORMAT 'YYYY-MM-DD' NOT NULL)UNIQUE PRIMARY INDEX upi_orders_2 (order_nr)PARTITION BY RANGE_N(order_cre_dt BETWEEN DATE'0001-01-01‘ AND DATE '9999-12-31' EACH INTERVAL '1' MONTH);

Row Hash order_nr order_cre_dtA11111 10 2007-01-11A22222 20 2007-02-22A33333 30 2007-01-12A44444 40 2007-02-23

Row Hash order_nr order_cre_dtA11111 10 2007-01-11A22222 20 2007-02-22A33333 30 2007-01-12A44444 40 2007-02-23

NPPI PPICreate Table

Insert Data

Data Distribution within AMPs

Selecting a Primary Index…

Uniform Data Distribution:The more distinct the primary index values, the betterRows having the same primary index value are distributed to the same AMPParallel processing is more efficient when table rows are distributed evenly across the AMPsOptimal Data Access:The primary index should be chosen on the most frequently used access pathPrimary index operations must provide the full primary index valuePrimary index retrievals on a single value are always one-AMP operationsVolatility:How often the value of index column is changed. The lesser it is changed the better choice in index it holdsThe Trade-Off:Data Distribution vs. Access PathNormal Access vs. Range AccessNPPI vs. PPI

Secondary Index…Enhances set selection by specifying access paths other than the primary index path

SI storage - System maintains a subtable for each SI. Subtables keep base table SI row hash, column values, and RowID of the base table which contains actual value. There is a overhead in maintaining SI subtable if the table involves INSERT/UPDATE/DELETE operations.

Restrictions on Secondary Indexes:A table can have up to 32 secondary, hash and join indexesNo more than 64 columns can be included in a secondary index definitionCannot include columns having BLOB or CLOB data types

SI Types:Unique Secondary Index (USI)Non-Unique Secondary Index (NUSI)Value-Ordered Secondary Index NUSI and Query CoveringNUSI Bit Mapping

USI Subtable Row Layout…

USI access is usually a two-AMP operation…

The process for locating a row using a USI is as follows:1. After checking the syntax and lexicon of the query,the Parser looks up the Table ID for the USI subtablethat contains the specified USI value2. The hashing algorithm hashes the USI value3. The Generator creates an AMP step messagecontaining the USI Table ID, USI row hash value, andUSI data value4. The Dispatcher uses the USI row hash to send themessage across the BYNET to AMP 3, which contains theappropriate USI subtable row5. The file system on AMP 3 locates the appropriate USIsubtable using the USI Table ID6. The file system on AMP 3 uses the USI row ID tolocate the appropriate index row in the subtable7. This operation might require a search through anumber of rows with the same row hash value beforethe row with the desired value is located8. AMP 3 reads the base table row ID from the USIrow and distributes a message containing the base tableID and the row ID for the requested row across theBYNET to AMP 10, which contains the requested basetable row9. The file system uses the row ID to locate the basetable row

NUSI - Subtable and access path different from that of USI…

NUSI subtables are created and stored locally on the AMPs – the corresponding part of the subtable is stored on the same AMP as that of the base table. NUSI Subtable stores RowID of base table that are located on the same AMPNUSI access is always an all-AMPs operationBecause NUSI subtable access is not hashed, the subtables must be scanned in order to locate the relevant pointers to base table rows

NUSI Subtable Row Layout…

NUSI access is a all-AMP operation…

The process used by this example for locating a rowusing the NUSI value CA is as follows:1. After checking the syntax and lexicon of the query,the Parser looks up the Table ID for the NUSI subtablethat contains the NUSI value ‘CA’2. The hashing algorithm hashes the NUSI value3. The Generator creates an AMP steps messagecontaining the NUSI Table ID (734596), NUSI row hashvalue (53), and NUSI data value (CA) and then theDispatcher distributes it across the BYNET to all AMPs4. The file system on a receiving AMP locates theappropriate NUSI subtable using the NUSI Table ID5. The file system on a receiving AMP uses the NUSI rowhash value to locate the appropriate index row in theSubtable6. If there is a NUSI row, its table row ID list is scannedfor base table row IDs7. The file system uses the row IDs to locate the basetable rows containing the NUSI value ‘CA’

USI and NUSI Examples…

CREATE MULTISET TABLE t1,NO FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL (i INTEGER NOT NULL, j INTEGER NOT NULL, a CHAR(10)) UNIQUE PRIMARY INDEX upi_t1 (i), UNIQUE INDEX usi_t1_01 (j);i j a100 100 a 200 200 a 300 300 a 400 400 a EXPLAIN SELECT * FROM t1 WHERE j = 100; 1) First, we do a two-AMP RETRIEVE step from t1 by way of unique index # 4 "t1.j = 100" with no residual conditions. The estimated time for this step is 0.02 seconds.

CREATE MULTISET TABLE t2,NO FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL (i INTEGER NOT NULL, j INTEGER NOT NULL, a CHAR(10)) UNIQUE PRIMARY INDEX upi_t2 (i), INDEX nusi_t2_01 (j);i j a100 100 a 200 100 a 300 300 a 400 400 a EXPLAIN SELECT * FROM t2 WHERE j = 100; 1) We do an all-AMPs RETRIEVE step from t2 by way of an all-rows scan with a condition of ("t2.j = 100") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows. The estimated time for this step is 0.03 seconds.

Value-Ordered NUSI…

Value-ordered NUSIs are very efficient for range conditionsBecause the NUSI rows are sorted by data value, it is possible to search only a portion of the index subtable for a given range of key valuesExamples:CREATE INDEX Idx_Date (o_orderdate) ORDER BY VALUES (o_orderdate)ON Orders;SELECT * FROM Orders WHERE o_orderdate BETWEEN ‘1997-10-01’ AND‘1997-10-07’;Value-ordered NUSIs have the following limitations:The sort key is limited to a single numeric or DATE columnThe sort key column cannot exceed four bytes in lengthThey count as 2 consecutive indexes against the total of 32 non-primary indexes you can define on a base or join index table. One index represents the column list and the other index represents the ordering column

NUSI Bit-Mapping…

Bit mapping is a technique used by the Optimizer to effectively link several weakly selective indexes in a way that creates a result that drastically reduces the number of base rows that must be accessed to retrieve the desired data.Teradata only performs NUSI bit mapping when weakly selective indexed conditions are ANDed and their composite selectivity is strong.Optimizer instruct each AMP to construct bit maps to determine which rowIDs their local NUSI rows have in common and then access just those rows, applying the conditions to them exclusively.Example:

Covering Index…

An index is said to be covering if all of the columns requested in a query are also available from existing index subtable, making it unnecessary to access the base table rows to complete the query.Example:Simple Query Considered for Index Covering:

CREATE INDEX IdxOrd (o_orderkey, o_date, o_totalprice) ON ORDERS;

SELECT o_date, AVG(o_totalprice) FROM ORDERS WHERE o_orderkey >1000GROUP BY o_date;

Aggregate Query Considered for Index Covering:

CREATE INDEX IdxEmployee (DeptNo) ON Employee;

SELECT DeptNo, COUNT(*) FROM Employee GROUP BY DeptNo;

Secondary Index selection criteria…

Consider creating secondary indexes on columns which are highly selectiveUSI is good choice when the table does not have UPI. This helps in avoiding duplicate data check when INSERT/UPDATE Operation is performed on the tableWhile USI retrievals are always very efficient, the efficiency of NUSI retrievals varies greatly depending on their selectivityConsider creating covering indexes wherever possibleConsider creating secondary indexes on columns frequently operated on by built-in functions such as aggregatesConsider assigning a uniqueness constraint such as PRIMARY KEY, UNIQUE through USIConsider naming secondary indexes whenever possible using a standard naming conventionAvoid assigning secondary indexes to frequently updated column setsAvoid creating excessive secondary indexes on a table

Join Index…

Join indexes allows denormalization of physical database without affecting the normalization of the physical and logical database modelsThese can serve the purpose of storing aggregated data as being used in Fact table in Dimensional ModelingUnlike traditional indexes, join indexes do not store pointers to their associated base table rowsInstead, they are generally used as a fast path final access point that eliminates the need to access and join the base tables they represent. They substitute for rather than point to base table rows. The only exception to this is the case where an index partially covers a queryIf the index is defined using either the ROWID keyword or the UPI of its base table as one of its columns, then it can be used to join with the base table to cover the queryStatistics should be collected on Join Index to have an updated informationJoin Index provide overhead if the table(s) are updated which are part of its definition. JI would simultaneously be rebuiltUser cannot directly select from a Join Index

Types of Join Index…

Single Table Join Indexes - allows hashing of rows based on column

other than PI. This column might be used in condition of SQL qualifying

the JI for data access. This helps in preventing redistribution of

underlying base table based on some other column.

Multitable Join Indexes - are useful for queries where the index

structure contains all the columns referenced by one or more joins,

thereby allowing the index to cover that part of the query, making it

possible to retrieve the requested data from the index rather than

accessing its underlying base.

Aggregated Join Index – allows to define a summary table without

violating the normalization of the database schema. This will allow a

join index to pre-compute an aggregate value that would otherwise

potentially require a full table scan and sort operation.

Examples of different Join Indexes…

Single-table Join Index:CREATE TABLE t1 (x1 INTEGER, y1 INTEGER, z1 INTEGER) PRIMARY INDEX (x1);CREATE TABLE t2 (x2 INTEGER, y2 INTEGER, z2 INTEGER)PRIMARY INDEX (x2);

CREATE JOIN INDEX j1 AS SELECT y1, ROWID FROM t1 PRIMARY INDEX (y1);

Multi-table Join Index:CREATE JOIN INDEX order_join_line ASSELECT (l_orderkey, o_orderdate, o_custkey, o_totalprice),(l_partkey, l_quantity, l_extendedprice, l_shipdate)FROM lineitem LEFT JOIN orders ON l_orderkey = o_orderkeyORDER BY o_orderdatePRIMARY INDEX (l_orderkey);

Aggregated Join Index:CREATE JOIN INDEX ord_cust_idx ASSELECT c_nationkey, SUM(o_totalprice(FLOAT)) AS price, o_orderdateFROM orders, customerWHERE o_custkey = c_custkeyGROUP BY c_nationkey, o_orderdateORDER BY o_orderdate;

Hash Index…

Hash indexes are file structures that share properties with both single-table join indexes and secondary indexesHash indexes can optionally be specified to be distributed in such a way that their rows are AMP-local with their associated base table rowsThey can also provide a transparent direct access path to those base table rows to complete a query only partially covered by the indexExample:CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER,o_orderstatus CHARACTER(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL,o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_orderpriority CHARACTER(21),o_clerk CHARACTER(16), o_shippriority INTEGER, o_comment VARCHAR(79))UNIQUE PRIMARY INDEX (o_orderkey);

CREATE HASH INDEX OrdHIdx_1 (o_orderdate) ON orders BY (o_orderdate)ORDER BY (o_orderdate);

Teradata Joins…

Joins available to user:

Left Outer JoinRight Outer JoinFull Outer JoinInner JoinCross JoinSelf Join

Teradata Internal Joins:

Product JoinMerge JoinNested JoinHash JoinSelf JoinCorrelated Join

Product Join and Merge Join…

Product Join: Compares every qualifying row from one table to every qualifying row from the other table and saves the rows that match the WHERE condition.Time consuming and hence a costly join.Requires bigger spool spaces.Usually used whenThe join condition is not based on equalityThe join conditions are ORedIt is less costly than other join formsMerge Join: Comparison of rows are done based on hash values of the joining columns.Sorting is performed before comparison.Comparison involves lesser number of rows in comparison to Product JoinDifferent methods to perform comparison of hash values:Redistribution of rows based on hash valuesDuplication of rows based on hash valuesMatching Indexes

Example of Merge Join based on Hash Redistribution…

SELECT Name, DeptName, Loc FROM Employee, Department

WHERE Employee.DeptNo = Department.DeptNo;

Since DeptNo in Employee table is not a UPI, but is a foreign key. The table would be hash redistributed based on the DeptNoHash Redistribution takes place local to AMPRows are sorted before applying join condition

ENum(UPI,PK)

Name Dept(FK)

1 Brown 200

2 Smith 310

3 Jones 310

4 Clay 400

5 Peters 150

6 Foster 400

7 Gray 310

8 Baker 310

Dept(UPI, PK)

Name

400 Delivery

150 Payroll

200 Finance

310 Mfg

Example of Merge Join based on Hash Redistribution…

6 FOSTER 4008 BAKER 310

4 CLAY 4003 JONES 310

1 BROWN 2007 GRAY 310

5 PETER 1502 SMITH 310

5 PETER 150

7 GRAY 3103 JONES 3108 BAKER 3102 SMITH 310

1 BROWN 2006 FOSTER 4004 CLAY 400

150 PAYROLL 310 MFG 200 FINANCE 400 DELIVERY

J

O

I

N

Employee Row Hash Distributed on Employee.ENum (UPI)

Employee Row Hash Re-Distributed on Employee.Dept Row Hash

Department Row Hash Distributed on Department.Dept (UPI)

Example of Merge Join based on Duplication of Table…

6 FOSTER 4008 BAKER 310

4 CLAY 4003 JONES 310


5 PETER 1502 SMITH 310

150 PAYROLL 310 MFG 200 FINANCE 400 DELIVERY

8 BAKER 3106 FOSTER 400

3 JONES 3104 CLAY 400


2 SMITH 3105 PETER 150

150 PAYROLL200 FINANCE310 MFG400 DELIVERY



150 PAYROLL200 FINANCE310 MFG400 DELIVERY J

O

I

N

Department table rows Hash Distributed on Department.Dept (UPI)

Employee table rows Hash Distributed on Employee.ENum (UPI)

Spool file after duplicating and sorting on Department.Dept Row Hash

Spool file after locally copying and sorting on Employee.Dept Row Hash

Example of Merge Join using Matching Indexes…

If the primary indexes of the joining tables are matching.No Redistribution is required.Example –SELECT *FROM Employee, Employee_PhoneWHERE Employee.Enum = Employee_Phone.Enum;

Nested Join…

A nested join is a join for which the WHERE conditions specify a constant value for a unique index in one table and those conditions also match some column of that single row to the primary or secondary index of the second table.

Example –

SELECT DeptName, Name, YrsExpFROM Employee, DepartmentWHERE Employee.EmpNo = Department.MgrNoAND Department.DeptNo = 100;

Correlated Queries…

A correlated query is a subquery whose outer query results are processed a row at a time against the subquery result.

SELECT last_name, department_number as DEPTNO,salary_amountFROM employee eeWHERE salary_amount = (SELECT MAX(salary_amount)FROM employee emWHERE em.department_number = ee.department_number);

Steps of execution:1. Read an employee row2. Get max salary for his/her department from the subquery3. Compare his/her salary to the max salary4. If equal, output this row5. Go to 1

Teradata Database Objects…

TablesBase TablesGlobal Temporary TablesVolatile TablesDerived TablesViewsMacrosStored ProceduresTriggersJoin IndexHash Index

Global Temporary Tables…

Global Temporary Tables:holds information for intermediate results of queries. Can be accessed by any sessions when materialized but data cannot be shared across sessionsUses spool space to store dataLocal instance is materialized when data is inserted or an index is defined or collect statistics is issuedOptionally emptied at the end of each transactionMaterialized tables are valid for session only. Data is lost once the logoff takes place.Stored in database schema

CREATE GLOBAL TEMPORARY TABLE gt_deptsal (deptno SMALLINT,avgsal DEC(9,2), maxsal DEC(9,2),minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT) ON COMMIT PRESERVE ROWS;

INSERT INTO gt_deptsalSELECT dept ,AVG(sal) ,MAX(sal) ,MIN(sal) ,SUM(sal) ,COUNT(emp)

FROM emp GROUP BY 1;

Volatile Tables…

Volatile TablesHolds information for intermediate results of queries.Valid for a session onlyAre not available after a session get a restart during dbs restartNo access logging can be doneNo indexes and referential integrity can be implementedNot stored in database schema

CREATE VOLATILE TABLE vt_deptsal, LOG (deptno SMALLINT,avgsal DEC(9,2),maxsal DEC(9,2),minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT)

ON COMMIT PRESERVE ROWS;

INSERT INTO vt_deptsal SELECT

dept ,AVG(sal) ,MAX(sal) ,MIN(sal) ,SUM(sal) ,COUNT(emp)FROM emp GROUP BY 1;

Derived Tables…

Derived tables are temporary tables that are created in spool and dropped when the query is completedExample – Employees who salary is greater than the company average –

SELECT last_name, salary_amount, avgsal,FROM (SELECT AVG(salary_amount) FROM employee)

my_temp(avgsal),employeeWHERE salar_amount > avgsalORDER BY 2 DESC;

Teradata Macro…

A macro consists of one or more statements that are executed in a single transaction

Macro is similar to performing a multi statement request. i.e. either all statements in the request complete successfully, or the entire request is aborted

All statements can be executed in parallel, making use of the parallel processing architecture of Teradata, thus reducing processing time

Macros simplify an operation that is complex or must be performed frequentlyCan return multi-row answer setTypically called from a triggerCreating a Macro:

CREATE MACRO NewEmpAdd (id INTEGER, name VARCHAR(50)) AS ( INSERT INTO EMPLOYEE values(:Id,:name);

);EXEC NewEmpAdd(25,’ABC’);

Macro vs. Stored Procedure…

Locking in Teradata…

Default locking mechanism in Teradata:READers can simultaneously READ the same database objectREADer needs to wait while a WRITE operation is in effect on the same database objectWRITEer needs to wait while a READ operation is in effect on the same database objectEverybody needs to wait while there is an EXCLUSIVE lock on the database objectThis definitely affects the transaction concurrencyThe solution is: ACCESS lockDown-grade the severity of lock by explicit specificationLOCKING t1 FOR ACCESS…But at the expense of Uncommitted Dependencies (Dirty Read) chancesSo at times, there is a trade-off between transaction concurrency and data integrityThe solution has to be build up at the application level

Locking Severity…The available lock severities, from most restrictive to least restrictive, are as follows:

Compatibility Among Locking Severities…

Locking Level…

Locking level – the database object on which the lock is placed

Default Lock Assignments…

The default lock assignments the Lock Manager applies:

Statistics…Statistics on a column or index of a table provides Optimizer about the details of:Total number of rowsTotal values for the columnUnique values for the columnNull values of the columnMaximum number of rows per valueMinimum number of rows per valueMinimum value for an intervalMaximum value for an intervalNumber of IntervalsUsing Statistics values, Optimizer plans for the best plan for the execution of the queryStatistics should be updated regularly so that Optimizer has access to the current information about the tableRandom AMP Samples (RAS) - If statistics are not available, then Teradata Optimizer uses Random AMP Samples which is the information collected from a single AMP about the table columns and the data stored in it.

Collect Statistics…

Statistics can be collected on –A single columnPrimary IndexSecondary IndexesPrimary Index of a Join IndexPrimary Index of a Hash IndexColumn which are part of Join Condition in a query

Collect Statistics Example…CREATE TABLE t1 ( i int, j int, k int);i j k100 100 100150 150 200200 200 100250 250 200300 400 100350 450 200400 500 200450 550 300600 700 500650 750 600

EXPLAIN SEL * FROM t1 WHERE i > 200;

3) We do an all-AMPs RETRIEVE step from t1 by way of an all-rows scanwith a condition of ("t1.i > 200") into Spool 1 (group_amps), which is builtlocally on the AMPs. The size of Spool 1 is estimated with no confidence tobe 1 row. The estimated time for this step is 0.03 seconds.

Collect Statistics Example…

COLLECT STATISTICS ON t1 INDEX (i);

EXPLAIN SEL * FROM t1 WHERE i > 200; 3) We do an all-AMPs RETRIEVE step from NS.t1 by way of an all-rows scan with a condition of ("NS.t1.i > 200") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 7 rows. The estimated time for this step is 0.03 seconds.

Explain…EXPLAIN <query>Explain describes about the execution plan that Optimizer has prepared for a query.It will tell number of steps involved in the execution of a queryTables/Views to be used in the queryParallel stepsInternal Joins to be usedRows estimation for each stepTime estimation for each stepExplain can be viewed through BTEQ, SQL Assistant and Visual Explain.Visual Explain provides graphics version of the explain steps which is more readable. Using it, explains for two queries can also be compared

Food for Optimizer…

Optimizer requires the following information to build a successful plan for the query execution –Environmental Cost parameters – weights of CPU, disk, and network, disk delays, dbscontrol settings, pde control settingsPerformance Constraints – data transfer rates for each type of storage medium and network interconnectionStatistics Information – about the table and columns used in the query. It includes total rows, number of unique values, number of rows per unique values, null values, minimum row value, maximum row value.Based on these costs, Optimizer decides how to perform joins, how to pull data from AMPs and how to redistribute it.

Tips for query optimization…

Collect statistics on the join fieldsCheck if you have included all the necessary join conditionsIsolate the join that is your ‘bottleneck’Avoid data transformation in join conditionsAvoid ‘DISTINCT’. Use ‘GROUP BY’Avoid IN and NOT IN. Use EXISTS and NOT-EXISTSReplace Outer Join with UPDATEReplace IN(..,..,..) by UNION for large queries if possible

Data Load Utilities…

MultiLoad utility (MLOAD) loads large quantities of data into unpopulated tables. MultiLoad also supports bulk inserts, updates, and deletions against populated tables

FastLoad utility loads unpopulated tables only. This program is similar to BulkLoad except that it runs much faster than BulkLoad and does not support update and delete operations

TPumpProvides for continuous update of tables; performs insert, update, and delete operations or a combination of these operations on tables using the same source feed

FastExport utility Provides parallel export of dataExports large quantities of data from the Teradata RDBMS to a client and is the functional complement of the FastLoad and MultiLoad utilities

References…

http://www.teradataforum.com/ncr_pdf.htmhttp://www.teradata.com

Teradata Overview

Documents

Transcript of Teradata Overview