Teradata Interview

37
These are some tricky questions asked about explain to see if an candidate has really worked on query tuning and how much he would know about optimizer & explain 1. What does ” Pseudo Table” Locks mean? It is a false lock which is applied on the table to prevent two users from getting conflicting locks with all-AMP requests. PE will determine an particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table. To put in Simple terms , its like an University with 10 gates and at a given time you can enter the university through one gate ( or main gate) for security check. 2. What is residual condition? Residual condition means a filter is applied on particular table to limit the number of rows fetched into Spool. Say for example Sel * from EMP.Employee where emp_sal > 10000; Here , residual condition will act upon to fetch only employees with salary greater than 10000 3. Spool “ Last use” The particular spool file used in the step will be used for last time and the spool space will be released since it is no longer used in further steps of the query execution. Why is BTET transaction processing overhead in Teradata? Answer: - BTET makes all the queries running between BT and ET as single transaction . If any of query fails then the entire set of queries will not be committed. - BTET also has an overhead with locking , since it holds locks on tables till the ET is occured or all the queries have executed successfully - DDL statements cannot be used everywhere in BTET processing , but these statements can be given towards the end of BTET transaction. - Using large number of BTET caused transient Journal to grow and has tendancy for System Restarts What are the options that are not available for global temporary tables ? GTT definitions are created in Data dictionary. These table provide separate instance to each user who refers to them . The following options are not available for global temporary tables: - Any kind of constraints like check/referential cannot be applied on table

description

Interview Questions Teradata

Transcript of Teradata Interview

Page 1: Teradata Interview

These are some tricky questions asked about explain to see if an candidate has really worked on query tuning and how much he would know about optimizer & explain 1.            What does ” Pseudo Table” Locks mean? It is a false lock which is applied on the table to prevent  two users from getting conflicting locks with all-AMP requests.PE will determine an particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table.

To put in Simple terms , its like an University with 10 gates and at a given time you can enter the university through one gate ( or main gate) for security check.

 2.            What is residual condition?Residual condition means a filter is applied on particular table to limit the number of rows fetched into Spool.Say for exampleSel * from EMP.Employee where emp_sal > 10000;

Here , residual condition will act upon to fetch only employees with salary greater than 10000

3.            Spool “ Last use”The particular spool file used in the step will be used for last time and the spool space will be released since it is no longer used in further steps of the query execution.

Why is BTET transaction processing overhead in Teradata?

Answer:- BTET makes all the queries running between BT and ET as single transaction . If any of query fails then the entire set of queries will not be committed.

- BTET also has an overhead with locking , since it holds locks on tables till the ET is occured or all the queries have executed successfully

- DDL statements cannot be used  everywhere in BTET processing , but these statements can be given towards the end of BTET transaction.

- Using large number of BTET caused transient Journal to grow and has tendancy for System Restarts

What are the options that are not available for global temporary tables ?GTT  definitions are created in Data dictionary.  These table provide separate instance to each user who refers to them .

The following options are not available for global temporary tables:- Any kind of constraints like check/referential cannot be applied on table- Identity columns since data in GTT are materialized only for session- Permanent Journaling cannot be done as data in tables are instances only to user in that session- PPI cannot be applied as data does not get stored in PERM , only TEMP space is utilized here.What are the options not available for volatile tables in teradata ?The following options are not available for volatile tables because table definition is not stored in data dictionary 

Page 2: Teradata Interview

- Default values for columns  - Title clause for columns- Named Indexes for table- Compression on columns/table level since table data are spooled- Stats cannot be collected since data is materialized only for session  - Identity columns as these again would need entry in IDcol tables- PPI cannot be done on tables- Any kind of constraints like check/referential cannot be applied on table

What are permanent journals in teradata?- Journals are used to capture information about table in Teradata.  In case of Permanent journals they capture details of Journal enabled tables in teradata   with all the pre transaction and post transaction details .  - Journal tables are assigned PERM space and they reside in same database as of parent or they can reside on different database.- They are mainly used for protection of data and sometimes  also for disaster recovery ( fallback is better in this case )- Permanent journal tables can be enabled or disabled by running alter database <databasename> 'no journal' /' journal = <databasename.jrnltbl>'- Arcmain  utility provides the feature of backing  up Journal tables - We can find details about all journal tables present in  teradata  database using DBC.JOURNALS table.

Query to find skew factor of a particular table ?SELECT TABLENAME,SUM(CURRENTPERM) /1024/1024 AS CURRENTPERM, (100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR FROM DBC.TABLESIZE WHERE DATABASENAME= <DATABASENAME> AND TABLENAME =<TABLENAME>  GROUP BY 1;

What is the acceptable range for skew factor in a table?

There is no particular range for skew factor.  In case of production systems, it is suggested to keep skew factor between 5-10.  There are various considerations for skew factor- Number of AMPS- Size of row in a table- number of records in a table- PI of a table- Frequent access of table (Performance consideration)- whether table getting loaded daily /monthly or how frequently data is being refreshed

What is multivalued compression in Teradata?

Page 3: Teradata Interview

Multivalued compression or just MVC is a compression technique applied on columns in Teradata .  MVC has a unique feature of compressing up-to 255 distinct values per column in a given table.  

The advantage of compression are

Reduced Storage cost by storing more of a logical data than physical data.

Performance is greatly improves due to  reduced retrieval of physical data for that column.   

Tables having compression always have an advantage since optimizer considers reduced I/O as one of the factors for generating EXPLAIN plan.

What are the ways by which we can use zero to replace a null value for a given column ?Answer - 

1. By using Teradata SQL supported command as followsSelect Col1, ZEROIFNULL(Col2)  from Table_name;

2. By using ANSI SQL command as followsa. CoalesceSelect Col1,COALESCE(Col2,0)    from Table_name;

b.Case operator select Case When Col2 IS NOT NULL Then Col2Else 0End from Table_name;

It is always suggested to use ANSI standard while coding in Teradata , since any changes in Teradata version due to upgrade/patches installation will lead to- Time for regression testing- rework of code.

First love teaches us not how to love, but how to live

What is a join index ? What are benefits of using Join index?Answer:

It is a index that is maintained in a system .It maintains rows  joined on two or more tables. Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query

Benefits if using join index is- to eliminate base table access- Aggregate processing is eliminated by creating aggregate join index- It reduces redistribution of data since data is materialized by JI.- Reduces complex join conditions by using covering queries

Page 4: Teradata Interview

Can a macro be called inside a macro?Answer:The main purpose of run a set of repeated sql queries.   Macro supports only DML queries .Hence We cant call any-other macro or not even a procedure in a macro.

One trick to have closest functionality of this is to copy all the sql queries from macro2 to macro1 and add parameters if it is necessary as shown below.

replace macro1( val int)as(  sel * from employee_table where empid= :val );

replace macro2( dept_no int)as(  sel * from employee_table where deptno= :dept_no );

so, to call a macro2 inside a macro1..it is not possible. Hence follow this approach

Replace macro1 (val int, dept_no int)as(sel * from employee_table where empid= :val;sel * from employee_table where deptno= :dept_no ;);

How do you find the list of employees named "john" in an employee table without using Like operator??This question seems tricky.. but yes there is another way by which we can find names/patters without using like operator.By using "BETWEEN" , we can find the lsit of employees named john... sel * from employee where name betwee 'J' and 'K';

But at times usage of between is tricky, if there are other employees starting with J, those employees will also be listed by this query.

how do you list all the objects available in given database?1.)select * from dbc.tables where databasename='<DATABASENAME>';

2.) By running a normal help command on that database as follows.  help database <DATABASENAME';

What are different types of Spaces available in Teradata ?There are 3 types of Spaces available in teradata ,they are

1. Perm space 

Page 5: Teradata Interview

-This is disk space used for storing user data rows in any tables located on the database. -Both Users & databases can be given perm space.-This Space is not pre-allocated , it is used up when the  data rows are stored on disk.

2.Spool Space -It is a  temporary workspace which is used for processing Rows for given SQL statements.-Spool space is assigned only to users . --Once the SQL processing is complete the spool is freed and given to some other query.-Unused Perm space is automatically available for Spool .  

3. TEMP space-It is allocated to any databases/users where Global temporary tables are created and data is stored in them.-Unused perm space is available for TEMP space

What is hash collision ?

This occurs when there is same hash value generated for two different Primary Index Values. It is a rare occurance and Has been taken care in future versions of TD.

What is RAID, What are the types of RAID?

Redundant Array of Inexpensive Disks (RAID)  is a type of protection available in Teradata. RAID  provides Data protection at the disk Drive level.  It ensures data is available even when the disk drive had failed.

Th\ere are around 6 levels of RAID ( RAID0 to RAID5) .  Teradata supports Two levels of RAID protectionRAID 1 - Mirrored copy of data in other diskRAID 5 - Parity bit (XOR) based Data protection on each disk array.

One of the major overhead's of RAID is Space consumption

How do you find the No of AMP 's in the teradata Database?

1.) You can do a SELECT HASHAMP()+1 to get the total number of Amps in the  given teradata system.2.) Details about amps can also be checked in Configuration management on  teradata PMON tool.

How do you see a DDL for an existing table in Teradata?

By using show table command  as followsshow table tablename ;

This will display DDL structure of table along with following detailsFallback

Page 6: Teradata Interview

Before/after journalset/multiset table typeIndex(PI,SI,PPI)details about column - datatype ,default,identity/primary -foreign key .

How to find duplicates in a table?To find duplicates in the table , we can use group by function on those columns which are to be used and then listing them if their count is >1 .

Following sample query can be used to find duplicates in table having  3 columnsselect col1, col2,col3, count(*) from table group by col1, col2, col3  having count  (*) > 1 ;

what are different types of journals in teradata?There are  3 different types of journals available in Teradata. They are

1. Transient Journal  - This maintains current transaction history. Once the query is successful it deletes entries from its table .  If the current query transaction fails, It rolls back data from its table.

2. Permanent Journal  - This is defined when a table is created.  It can store BEFORE or AFTER image of tables. DUAL copies can also be stored. Permanent Journal maintains Complete history of table.

3.Down AMP recovery Journal (DARJ)  - This journal activates when the AMP which was supposed to process goes down.  This journal will store all entries for AMP which went down.  Once that AMP is up, DARJ copies all entries to that AMP and makes that AMP is sync with current environment.

What are the reasons for product joins ?1. Stale or no stats causing optimizer to use product join 2. Improper usage of aliases in the query.3. missing where clause ( or Cartesian product join  1=1 )4. non equality conditions like > ,< , between   example ( date)5. few join conditions6.  when or conditions are used.

last but not the least   product joins are not bad always!! sometimes PJ are better compared to other types of joins.

How to rename columns using views?Create a view which is a subset of the employee table as follows.Renaming columns in views will help increase security of sensitive tables  and hiding columns under alias names

Page 7: Teradata Interview

Replace  view EmployeeV( number,fullname,addr,phno,depnum,sal, expr) as locking row for access  selEmpNo ,                        Name  ,                        Address,                       Phone   ,                      DeptNo   ,                     Salary    ,                    YrsExp                       from employee;

When we run sel * form employeeV, Only alias columns are displayed .

What are Restrictions on Views in Teradata?An index cannot be Created on a view.It cannot contain an ORDER BY clause.All the derived columns and aggregate columns used in the view  must have an AS clause (alias defined).A view cannot be used to UPDATE if it contains the following :        – Data from more than one table (JOIN VIEW)        – The same column twice        – Derived columns        – A DISTINCT clause        – A GROUP BY clause

how do you you implement Multi valued compression in an existing table?

MVC can be implemented on following conditions1. A new column with multi valued compression can be added to an existing table, but cannot modify existing compressed column.2. Create a new table with column which has MVC and do insert .. select from original table     Or  CREATE TABLE... as with column designated MVC.

list Built-in functions used in teradata ?

The main functionality of built in functions is  that they dont need any arguments or paramaters and can be used directly with select to return system values.Some of them are

• SESSION: – Returns a number for the session for current user .• TIME: – this function provides the current time based on a 24-hour day• USER: –  This one gives the user name of the current user.• ACCOUNT: – display's your Teradata Account information 

Page 8: Teradata Interview

• CURRENT_DATE: – Returns the current system date • CURRENT_TIME: - This function returns the current system time and current session ‘Time Zone’ displacement. • CURRENT_TIMESTAMP: - Returns the current system timestamp  with TimeZone• DATABASE: –  It  returns the name of the default database for the current user.• DATE: – same as Current_DATE   and is teradata built in .

How do you whether table is locked or not?Just run the following query on the table.

Lock Table DBNAME.TABLENAME write nowaitSelect * from DBNAME.TABLENAME;

If this table is locked currently then , then the above statement would fail as Nowait will not wait for locks to be released on that  table .

What are advantages of compression on tables?- They take less physical space then uncompressed columns hence reducing space cost- They improve  system performance as less data will be retrieved per row fetched , more data is fetched per data block thus increasing data loading speed- They reduce overall I/O 

How many error tables are there in fload and Mload and what is their significance/use?

How many error tables are there in fload and Mload and what is their significance/use?

Answers: 

Fload uses 2 error tables

ET TABLE 1: where format of data is not correct.

ET TABLE 2: violations of UPI

It maintains only error field name, error code and data parcel only.

Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table

1. ET TABLE - Data error

MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.

Page 9: Teradata Interview

2. UV TABLE - UPI violations

MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task

Apart from error tables, it also has work and log tables

3. WORK TABLE - WT

Mload loads the selected records in the work table

4. LOG TABLE

A log table maintains record of all checkpoints related to the load job, it is essential/mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.

How do you find out number of AMP's in the Given system

How do you find out number of AMP's in the Given system

Answer

1.running following query in queryman

Select HASHAMP () +1;

2. We can find out complete configuration details of nodes and amps in configuration screen of Performance monitor

What are the difference types of temporary tables in Teradata?

What are the difference types of temporary tables in Teradata?

Answers:

Page 10: Teradata Interview

a.       Global temporary tables

b.       Volatile temporary tables

c.       Derived tables

Global Temporary tables (GTT) –

1. When they are created, its definition goes into Data Dictionary.

2. When materialized data goes in temp space.

3. That's why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some other session then its should be Drop table all;

4. You can collect stats on GTT.

5. Defined with the CREATE GLOBAL TEMPORARY TABLE sql

Volatile Temporary tables (VTT) -

1. Local to a session (deleted automatically when the session terminates)

2. Table Definition is stored in System cache .A permanent table definition is stored in the DBC data dictionary database (DBC.Temptables) .

3. Data is stored in spool space.

4. That’s why; data and table definition both are active only up to session ends.

5. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level (while creating table)

6. Created by the CREATE VOLATILE TABLE sql statement

Derived tables

1 Derived tables are local to an SQL query.

2 Not included in the DBC data dictionary database, the definition is kept in cache.

3 They are specified on a query level with an AS keyword in an sql statement

List types of HASH functions used in Teradata?

List types of HASH functions used in Teradata?

Page 11: Teradata Interview

Answer:

SELECT HASHAMP (HASHBUCKET (HASHROW ())) AS “AMP#”, COUNT (*) FROM GROUP BY 1 ORDER BY 2 DESC;

There are HASHROW, HASHBUCKET, HASHAMP and HASHBAKAMP.

The SQL hash functions are:

    * HASHROW (column(s))

    * HASHBUCKET (hashrow)

    * HASHAMP (hashbucket)

    * HASHBAKAMP (hashbucket)

Example:

SELECT

            HASHROW ('Teradata')   AS "Hash Value"

            , HASHBUCKET (HASHROW ('Teradata')) AS "Bucket Num"

            , HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Num"

            , HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata')))  AS "AMP Fallback Num" ;

This is really good, by looking into the result set of above written query you can easily find out the Data Distribution across all AMPs in your system and further you can easily identify un-even data distribution

What are the advantages and dis-advantages of secondary Indexes?

What are the advantages and dis-advantages of secondary Indexes?

Answer:

Advantages:

1. A secondary index might be created and dropped dynamically

2.A table may have up to 32 secondary indexes.

Page 12: Teradata Interview

3. Secondary index can be created on any column. .Either Unique or Non-Unique

4. It is used as alternate path or Least frequently used cases.  ex. defining SI on non indexed column can improve the performance, if it is used in  join or filter condition of a given query.

5. Collecting Statistics on SI columns make sure Optimizer choses SI if it is better than doing Full Table Scans

Disadvantages

1. Since Sub tables are to be created, there is always an overhead for additional spaces.

2. They require additional I/Os to maintain their sub tables.

3. The Optimizer may, or may not, use a NUSI, depending on its selectivity.

4. If the base table is Fallback, the secondary index sub table is Fallback as well.

5. If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.

Where does TD store transient journal?

Where does TD store transient journal?

In perm space -> dbc.transientjournal

But that special table can grow over dbc's perm limit until the whole system runs out of perm space.

How to select first N Records in Teradata?

How to select first N Records in Teradata?

Answers:

To select N records in Teradata you can use RANK function. Query syntax would be as follows

SELECT BOOK_NAME, BOOK_COUNT, RANK(BOOK_COUNT) A FROM LIBRARY QUALIFY A <= 10;

How to view every column and the columns contained in indexes in Teradata?How to view every column and the columns contained in indexes in Teradata?

Page 13: Teradata Interview

Answers:

Following query describes each column in the Teradata RDBMS

SELECT * FROM DBC.TVFields;

Following query describes columns contained in indexes in the Teradata RDBMS

SELECT * FROM DBC.Indexes;

How Teradata makes sure that there are no duplicate rows being inserted when it’s a SET table?

How Teradata makes sure that there are no duplicate rows being inserted when it’s a SET table?

Answers:

Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate.

If it’s a duplicate it silently skips it without throwing any error.

What is a clique?

What is a clique?

A clique is a set of Teradata nodes that share a common set of disk arrays which are connected in daisy chain network to each disk array controller.

Cliques provide data accessibility if a node fails for any reason,  Proms are distributed across all nodes in the system. Large multiple node systems will have clique mechanisms associated with them

What is the difference between MultiLoad & Fastload interns of Performance?  

What is the difference between MultiLoad & Fastload interns of Performance?

Answers:

Page 14: Teradata Interview

If you want to load, empty table then you use the fastload, so it will very useful than the MultiLoad ,because fastload performs the loading of the data in 2phase and its no need a work table for loading the data .

So it is faster as well as it follows the below steps to load the data in the table

Phase1 - It moves all the records to the entire AMP first without any hashing

Phase2 - After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS.

MultiLoad:

It does the loading in the 5 phases

Phase1 - It will get the import file and checks the script

Phase2 - It reads the record from the base table and store in the work table

Phase3 - In this acquisition phase it locks the table header

Phase4 - In the DML operation will done in the tables

Phase5 - In this table locks will be released and work tables will be dropped.

How does indexing improve query performance? How does indexing improve query performance?

Answers:

Indexing is a way to physically reorganize the records to enable some frequently used queries to run faster.

The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return it back to the user.

Or

The frequently used queries need not hit a large table for data. They can get what they want from the index itself. - cover queries.

Index comes with the overhead of maintenance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.

Indexes cannot be accessed directly by users. Only the optimizer has access to the index.

What is error table? What is the use of error table?

Page 15: Teradata Interview

What is error table? What is the use of error table?

Answers:

The Error Table contains information concerning:

- Data conversion errors Constraint violations and other error conditions:

* Contains rows which failed to be manipulated due to constraint violations or Translation error

* Captures rows that contain duplicate Values for UPIs.

* It logs errors & exceptions that occurs during the apply phase.

* It logs errors that are occurs during the acquisition phase.

How to find out list of indexes in Teradata?

How to find out list of indexes in Teradata?

IndexType Description

P Nonpartitioned Primary

Q Partitioned Primary

S Secondary

J join index

N hash index

K primary key

U unique constraint

V value ordered secondary

H hash ordered ALL covering secondary

O valued ordered ALL covering secondary

I ordering column of a composite secondary index

M Multi column statistics

D Derived column partition statistics

Page 16: Teradata Interview

1 field1 column of a join or hash index

2 field2 column of a join or hash index

SELECT

databasename, tablename, columnname, indextype, indexnumber, indexname

FROM  

dbc.indices

ORDER BY

databasename,  tablename, indexnumber;

When should the statistics be collected?

When should the statistics be collected?

Here are some excellent guidelines on when to collect statistics:

·       All Non-Unique indices

·       Non-index join columns

·       The Primary Index of small tables

·       Primary Index of a Join Index

·       Secondary Indices defined on any join index

·       Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions

·       Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.

How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?

How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?

Answer:

Page 17: Teradata Interview

A optimizer would prefer FTS over NUSI, when there are no Statistics defined on NUSI columns.It would prefer scanning for full table instead of going for Subtables in NUSI  since optimized does not have information about subtables of NUSI

It is always suggested to collect statistics whenever NUSI columns are defined on the table.

Verify whether index is being used by checking in Explain plan.

What are the basic rules that define how PI is defined in Teradata?

What are the basic rules that define how PI is defined in Teradata?

Answer:

The following rules govern how Primary Indexes in a Teradata Database must be defined as well as how they function:

One Primary Index per table.

A Primary Index value can be unique or non-unique.

The Primary Index value can be NULL.

The Primary Index value can be modified.

The Primary Index of a populated table cannot be modified.

A Primary Index has a limit of 64 columns. 

What are the basic criteria to select Primary Index column for a given table?What are the basic criteria to select Primary Index column for a given table?

Answer:

A thumb rule of ADV demographics is followed.

Access Demographics

Identify index candidates that maximize one-AMP operations.

Columns most frequently used for access (Value and Join).

Distribution Demographics

Identify index candidates that optimize parallel processing.

Columns that provide good distribution.

Page 18: Teradata Interview

Volatility Demographics

Identify index candidates with low maintenance I/O.

What is explain in teradata?

What is explain in teradata?

The EXPLAIN facility is a teradata extension that provides you with an "ENGLISH" translation of thesteps chosen by the optimizer to execute an SQL statement. It may be used on any valid teradata database with a preface called "EXPLAIN".

The following is an example:-EXPLAIN select last_name first_name FROM employees;

1. The EXPLAIN parses the SQL statement but does not execute it.

2. This provides the designer with an "execution strategy".

3. The execution strategy provides what an optimizer does but not why it chooses them.

4. The EXPLAIN facility is used to analyze all joins and complex queries.

What are the different return codes(severity errors) in Teradata utilities?

What are the different return codes(severity errors)  in Teradata utilities?

There are 3 basic return codes (severity errors) in teradata utilities.

0 - success

4 - Warning

8 - User error

12 - System error16 - system error   

Please note that apart from this there are separate error codes for each of the error  returned from sql queries.

For more details on error codes and fixing them please refer  to General reference manual  available from teradata documentation section in the teradata site.

what are the different date formats available in Teradata system?

Page 19: Teradata Interview

what are the  different date formats available in Teradata system?

Answer:There are two different date formats available ,  they are  The Teradata default format is: YY/MM/DD

The ANSI display format is: YYYY-MM-DD

How do you set default date setting in BTEQ?

How do you set default date setting in BTEQ?

Answer

There are two default date setting in BTEQ.  They have to be set after logging on to the session

set session dateform = ANSIDATE ;  /*format is yyyy-mm-dd */                

set session dateform = integerdate ; /* format is yy/mm/dd   -teradata date format */

What does DROP table command do?What does DROP table table_name command do?

It deletes all data in table_name

Removes the definition from the data dictionary

Removes all explicit access rights on the table

Is Like comparison case-sensitive in Teradata?

Is Like comparison case-sensitive in Teradata?

Answer

LIKE operator is not case sensitive in Teradata session mode.

Consider the following example  

Select F_name from employee where F_name like '%JO%’;

The following query will pick values matching with 'JO' and 'jo' as well, since Teradata is not case-sensitive

Page 20: Teradata Interview

To overcome this problem, a new function called "CASESPECIFIC" is used in TERADATA as follows

Select F_name from employee where F_name (CASESPECIFIC) like '%JO%’;

How do you submit bteq script (batch mode)?

How do you submit bteq script (batch mode)?

Answer

1. Start the BTEQ , by typing BTEQ

2. Enter the following command

            .run file = BTEQScript.btq

OR

1. Bteq < BTEQScript.btq

BTEQScript.btq contains following

 .logon 127.0.0.1/dbc, dbc;

sel top 10 * from dbc.tables;

.quit

What is the command in BTEQ to check for session settings ?What is the command in BTEQ to check for session settings ?

Answer

The BTEQ .SHOW CONTROL command displays BTEQ settings.

The .SHOW CONTROL command shows all BTEQ session parameters

Example

 [SET] FOLDLINE          = OFF ALL                                              

 [SET] FOOTING           = NULL                                                 

Page 21: Teradata Interview

 [SET] FORMAT            = OFF                                                  

 [SET] FORMCHAR          = OFF                                                  

 [SET] HEADING           = NULL                                                 

 [SET] INDICDATA         = OFF                                                  

 [SET] NOTIFY            = OFF                                                  

 [SET] NULL              = ?                                                    

 [SET] OMIT              = OFF ALL                                              

 [SET] PAGEBREAK         = OFF ALL                                              

 [SET] PAGELENGTH        = 55                                                   

 [SET] QUIET             = OFF                                                  

 [SET] RECORDMODE        = OFF                                                  

 [SET] RETCANCEL         = OFF                                                  

 [SET] RETLIMIT          = No Limit                                             

 [SET] RETRY             = ON                                                   

 [SET] RTITLE            = NULL                                                 

 [SET] SECURITY          = NONE                                                 

 [SET] SEPARATOR         = two blanks                                           

 [SET] SESSION CHARSET   = ASCII                                                

 [SET] SESSION SQLFLAG   = NONE                                                 

 [SET] SESSION TRANSACTION = BTET                                               

 [SET] SESSIONS          = 1                                                    

 [SET] SIDETITLES        = OFF for the normal report. 

 [SET] SKIPDOUBLE        = OFF ALL                                             

 [SET] SKIPLINE          = OFF ALL                                             

 [SET] SUPPRESS          = OFF ALL                                             

 [SET] TDP               = l5442                                               

 [SET] TITLEDASHES       = ON for the normal report.                           

       And, it is   ON for results of WITH clause number: 1 2 3 4 5 6 7 8 9.   

 [SET] UNDERLINE         = OFF ALL                                              [SET] WIDTH             = 75

Page 22: Teradata Interview

What are benefits of Permanent Journal?What are the benefits of Permanent Journal?

The benefits of Permanent Journal are Permits capture of before images for database rollback.

Permits capture of after images for database roll forward.

Permits archiving change images during table maintenance.

Reduces need for full-table backups.

Provides a means of recovering NO FALLBACK tables.

Requires additional disk space for change images.

Requires user intervention for archive and recovery activity

What are the benefits of fallback?

What are the benefits of fallback?

The benefits of fallback are Protects your data from hardware (disk) failure.

Protects your data from software (node) failure.

Automatically recovers with minimum recovery time, after repairs or fixes are complete

What’s the difference between TIMESTAMP (0) and TIMESTAMP (6)?

What’s the difference between TIMESTAMP (0) and TIMESTAMP (6)?

TIMESTAMP (0) is CHAR (19) and TIMESTAMP (6) is CHAR (26)

Timestamp(0) is YYYY-MM-DDbHH:MI:SS

Timestamp(6) is YYYY-MM-DDbHH:MI:SS.ssssss  ( milliseconds extra)

What is a Dirty-Read or Stale-Read Lock?

What is a Dirty-Read or Stale-Read Lock?

This occurs when a access lock is applied on the table which is doing a update.

May produce erroneous results if performed during table maintenance resulting in Dirty Read or stale read , which might  result in inconsistent result set.

Difference between BTEQ and Sql assistant (query man)?

Page 23: Teradata Interview

Differences between BTEQ and Sql assistant (query man)?

BTEQ Basic Teradata Query utility

SQL front-end

Report writing and formatting features

Interactive and batch queries

Import/Export across all platforms

The default number of sessions, upon login, is 1.

 Teradata Query Manager / Queryman / TeradataSQL Assistant SQL front-end for ODBC compliant databases

Historical record of queries including:

-         Timings-         Status-         Row counts

Random sampling feature

Limit amount of data returned

Import/Export between database and PC

Export to EXCEL or ACCESS

How do you execute the given SQL statement repeatedly in BTEQ?How do you execute the given SQL statement repeatedly in BTEQ?Answer:

Select top 1* from database.table1;=nHere "=n” is to run the previous sql statement, "n" number of times.

What are types of PARTITION PRIMARY INDEX (PPI) in Teradata?What are types of PARTITION PRIMARY INDEX (PPI) in Teradata?Answer:

1. Partition by CASECREATE      TABLE ORDER_Table(ORD_number  integer NOT NULL,customer_number integer NOT NULL,order_date  date ,order_total integer)PRIMARY INDEX (customer_number)PARTITION BY case1 (            order_total < 10000 ,            order_total < 20000 ,            order_total < 30000,

Page 24: Teradata Interview

NO           CASE     OR        UNKNOWN ) ;

2. Partition by Range  - example using date rangeCREATE TABLE ORDER_Table(ORD_number  integer NOT NULL,customer_number integer NOT NULL,order_date  date ,order_total integer)PRIMARY INDEX (customer_number)PARTITION BY range1 (            Order_date BETWEEN date '2010-01-01'       AND      date '2010-12-01'            EACH interval '1' month ,NO RANGE        OR  UNKNOWN);

P.S:If  we use NO RANGE or NO CASE  - then all values not in this range will be in a single partition.If we specify UNKNOWN, then all null values will be placed in this partition

Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?Answer:

PPI cannot be defined on PI column in Table.  Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column.In case of PPI , The data stored in AMP's are Partitioned based on PPI column after they are row hashed  (ROW KEY = ROW ID +PPI value )

P.S: If you want to create UPI on table, then PPI column can be added as part of PI .

How to skip 1st record while using Bteq IMPORT?How to skip 1st record while using Bteq IMPORT?Answer:

By using SKIP=1 ;   , we can skip first record. .import infile=<filename>, skip=1;

What is TENACITY? What is its default value?What is TENACITY? What is its default value?Answer

TENACITY specifies the amount of time in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for Fast Load is “no tenacity”, meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours.

Page 25: Teradata Interview

What does SLEEP function does in Fast load?What does SLEEP function does in Fast load?Answer

The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions.Sleep command can be used with all load utilities not only fastload.This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration.

What is Cross Join?What is Cross Join?Answer:

It is a Teradata specified Join, which is used as equivalent to product join.There is no “On” clause in case of CROSS joinSELECT  EMP.ename , DPT.DnameFROM     employee EMPCROSS JOINDepartment DPTWHEREEMp.deptno = DPT.depto ;

How many types of Index are present in teradata?How many types of Index are present in teradata?Answer:

There are 5 different indices present in Teradata1. Primary Index      a.Unique primary index      b. non Unique primary index2. Secondary Index      a. Unique Secondary index      b. non Unique Secondary index3. Partitioned Primary Index      a. Case partition (ex. age, salary...)      b. range partition ( ex. date)4. Join index      a. Single table join index      b. multiple table join index      c. Sparse Join index ( constraint applied on join index in where clause)5. Hash index

Difference between Stored Procedure and Macro?Difference between Stored Procedure and Macro?Answer:

Page 26: Teradata Interview

SP:It does not return rows to the user.It has to use cursors to fetch multiple rowsIt used Inout/Out to send values to userIt Contains comprehensive SPLIt is stored in DATABASE or USER PERMA stored procedure also provides output/Input capabilities

Macros:It returns set of rows to the user.It is stored in DBC SPOOL spaceA macro that allows only input values

What is a Sparse Index?What is a Sparse Index?Answer:

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index. All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

How to handle nulls in Teradata??? How many columns can be there in a table???How to handle nulls in Teradata???How many columns can be there in a table???Answer

1. Use zeroifnull, nullifzero in select and NULL in insert directly.2. 256 columns max per table.

How to find average sal with out using avg function????How to find average sal with out using avg function????Answer

Without using "avg" we can find the avg salary by using sum (sal)/count (sal);sel sum(sal)/count(sal) as avgsal from tablename

What is difference B/w User and database in Teradata?What is difference B/w User and database in Teradata?Answer:

- User is a database with password but database cannot have password- Both can contain Tables , views and macros- Both users and databases may or may not hold privileges- Only users can login, establish a session with Teradata database and they can submit requests

How do you create materialized view in Teradata?

Page 27: Teradata Interview

How do you create materialized view in Teradata?Answer:

There is no such thing as a "materialized view" in Teradata. The equivalent in Teradata would be a join index (or aggregate index) with a corresponding view put on top of it. The command to create one is "CREATE JOIN INDEX...(lots of options)".

Join indices are maintained when the data is maintained. They cannot be accessed directly, but are invoked when the optimizer determines it is beneficial. Even though you cannot invoke a join index directly, you create a view that looks like the join index and in that way, it would be similar to having direct access. However, this approach does not guarantee that the join index will be used.

What are Differences between Teradata and ANSI Session modes in Teradata?What are Differences between Teradata and ANSI Session modes in Teradata?Answer:

TERADATA MODE-------------1. Comparison is not Case sensitive2. Create table are default to SET tables3. Each transaction is committed implicitly4. Supports all Teradata commands 5. It follows BTET (Begin and End Transaction) Mode

ANSI MODE-------------1. Comparison is CASE sensitive2. Create table are default to MULTISET tables3. Each transaction has to be committed explicitly4. Does not support all Teradata commands 5. It does not follow BTET Mode

What are the scenarios in which Full Table Scans occurs?What are the scenarios in which Full Table Scans occurs?

1. The where clause in SELECT statement does not use either primary index or secondary index2. SQL Statement which uses a partial value (like or not like), in the WHERE statement.3. SQL Statement which does not contain where clause.4. SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000)

How to identify PPI columns?How to identify PPI columns?Answer:

Select databasename , tablename , columnposition ,columnname from dbc.indices

Page 28: Teradata Interview

where indextype ='Q'order by 1 ,2,3 ;

How to skip the header row in the fastload scriptHow to skip the header row in the fastload script

RECORD 2;   /* this skips first record in the source file */DEFINE ...

Explain types of re-distribution of data happening for joining of columns from two tablesExplain types of re-distribution of data happening for joining of columns from two tablesAnswer:

Case 1 - P.I = P.I joinsCase 2 - P.I = N.U.P.I joinsCase 3 - N.U.P.I = N.U.P.I joins

Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.Case2 - data from second table will be re-distributed on all amps since joins are happening on PI vs. NUPI column.  Ideal scenario is when small table is redistributed to be joined with large table records on same ampcase3 - data from both the tables are redistributed on all AMPs.  This is one of the longest processing queries , Care should be taken to see that stats are collected on these columns

Can you load multiple data files for same target table using Fastload?Can you load multiple data files for same target table using Fastload?Answer:

Yes, we can Load a table using multiple datafiles in Fastload.

Before giving "end loading" statement user can define file path and use insert sql for multiple source files and give "end loading" statement at the end

Why does varchar occupy 2 extra bytes?Why does varchar occupy 2 extra bytes?Answer

The two bytes are for the number of bytes for the binary length of the field.It stores the exact no of characters stored in varchar

How many types of Skew exist?How many types of Skew exist?Answer

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists. Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

Page 29: Teradata Interview

-Data skew?When data is not distributed equally on all the AMPs.-Cpu skew?Who is taking/consuming more CPU called cpu skew.-IO skew?Who perform more IO Operation? Resulting in IO Skew

Why Fload doesn’t support multiset table?Why Fload doesn’t support multiset table?Answer

Fload does not support Multiset table because of following reason.

Say, the fastload job fails. Till the fastload failed, some number of rows was sent to the AMP's.Now if you restart FLOAD, it would start loading record from the  last checkpoint and some of the consecutive rows are sent for the second time. These will be caught as duplicate rows are found after sorting of data.   This restart logic is the reason that Fastload will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic. Fastload support Multiset table but does not support the duplicate rows. Multiset tables are tables that allow duplicate rows. When Fastload finds the duplicate rows it discards it. Fast Load can load data into multiset table but will not load the duplicate rows.

What is Global Temporary table? What is the use of this Global Temporary table?Can we take collect stats on Derived Tables and Volatile tables and Temporary tables?What is Global Temporary table?What is the use of this Global Temporary table?

Answer1. No for volatile and derived tables and yes for global tables.2. Global tables are temp tables like volatile tables but unlike volatile tables, their definitions are retained in dd.3. It is used whenever there is a need for a temporary table with same table definition for all users.

What is the default join strategy in Teradata???What is the default join strategy in Teradata???Answer# 1The Teradata Cost based optimizer will decide the join strategy based on the optimum path. The common strategies followed are from Merge, Hash & Nested Join

Answer # 2 Three strategies followed by optimizer are:1. Duplication of rows of one table to every amp--> This one is opted by optimizer when the non-PI column is on a small table.2. Redistribution of the non PI rows of the table to the amp containing the matching PI row amp.

Page 30: Teradata Interview

--> This one is opted when the tables are large. The non PI column is in one table is redistributed to the amp containing the matching PI.3. Redistribute both the rows of the table by hash values.--> This is chosen when the join is on a column that is not the PI in either table. This will require the most spool space.

What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?Answer

Explain gives the execution strategy means what are the different steps that the query will go through.HIGH CONFIDENCE:  Statistics are collected.LOW CONFIDENCE:   Statistics are not collected.  But the where condition is having the condition on indexed column. Then estimations can be based on sampling.NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.

How to Skip or Get first and Last Record from Flat File through MultiLoad?How to Skip or Get first and Last Record from Flat File through MultiLoad?Answer

In .IMPORT command in Mload we have a option to give record no. from which processing should begin. i.e. ‘FROM m’ ‘m’ is a logical record number, as an integer, of the record in the identified data source where processing is to begin. You can mention ’m’ as 2 and processing will  start from second record.

THRU k and FOR n are two options in the same Mload command, functions same towards the end of the processing.

Adding to the above, if from n"start record" and for n "stop record" are not mentioned, mload considers records from   start till the end of the file.

Which is faster – MultiLoad delete or Delete command?Which is faster – MultiLoad delete or Delete command?Answer

MultiLoad delete is faster then normal Delete command, since the deletion happens in data blocks of 64Kbytes, where as delete command deletes data row by row.  Transient journal maintains entries only for Delete command since Teradata utilities doesn’t support Transient journal loading.

What is Teradata Virtual storage?What is Teradata Virtual storage?Answer

This concept is introduced in TD12. It does the following tasks- maintains information on frequency of data access - tracks data storage task on physical media- migrating frequently used data to fast disks and less frequently used data to slower disks- allocating cyclinders from storage to individual AMPs

Page 31: Teradata Interview

how to start / stop a database in windows?how to start / stop a database in windows?Answer

1. logon to CMD2. check for state pdestate -d3.  run the following command to start  "net start recond"4. check for status  pdestate -d5.  to STOP the database  - Trusted Parallel Application or TPAtpareset -x commentThe -x option stops Teradata without stopping the OS.

What is a role?What is a role?Answer

A role is a set of access rights which can be assigned to the users.   They indirectly help in performance by reducing the number of rows entered in DBC.accessrights

what is a profile?what is a profile?Answer

A profile contains set of user parameters  like accounts, default database, spoolspace, and temporary space for a group of users

To assign the profile to a user, use the AS PROFILE modified in a CREATE USER orMODIFY USER statement:MODIFY USER username AS PROFILE=profilename ;To remove a profile from a member but retain the profile itself:MODIFY USER username AS PROFILE=NULL ;

Why are AMPs and PEs called as vprocs ?Why are   AMPs and PEs called as vprocs ?Answer

AMPs and PEs are implemented as “virtual processors - vprocs”. They run under the control of PDE and their number is software configurable.AMPs are associated with “virtual disks – vdisks” which are associated with logical units (LUNs) within a disk array

Answer2:Vprocs:Virtual process From PE to AMP (This is the network root via MSP(message passing layer),The processing data will store in Disks(These are Physical disks),Each Amp have too many P.disks,to migrate these P.disks The Bynet Network maintains Virtual disks.These V.disks will responsible for data migration.hence they are called as Virtual Process(VPROCS).

What is residual condition in explain plan ?

Page 32: Teradata Interview

What is residual condition in explain plan ?Answer

It is a condition which help u to reduce the number used for join condition. Residual condition does not help in locating a row

How to check if given object is a database or user ?How to check if given object is a database  or user ?Answer

To check whether the given object is user or database , we can use following querysel * from dbc.databases where dbkind  ='U'  or dbkind='D';

In a table can we use primary key in one column and in another column both unique and not null constrains.if yes how?Answers:

Yes, you can have a column for Primary key and have another column which will have no duplicates or null.e.g.A Salary Table will have employee ID as primary key.The table also contains TAX-ID which can not be null or duplicate

How many codd's rules are satisfied by teradata database?Answers:12 codd’s rules.

Explain about Skew Factor?Answers:

The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.ex:AMP0 AMP110000(10%) 9000000(90%)in this situation skew factor is very high 90%

What is use of compress in teradata?Explain?Answers:

Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the tableConditions:1.Compression can be declared at the time of table creation2.We can compress up to 256 column values(not columns) 3.We can't compress variable length fields (vartext,varchar..)

Explanationfor 2 condition:create table tab1(::Order_type char(25) compress ('air','sea','road'):)in the above example order type have 3 fields, one should be selected by the user, so one of the field will repeat for every order, like these column values we can use compress statement because these are repeating forentire table, like these column values TD supports 256 col generally NCR people will provides ready

Page 33: Teradata Interview

made scripts for these type of compressions However, we can store only one value per column and not 3(air, sea, road). The compressed value isstored in column header and is to be used as default of that column unless a value is present.e.g.Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column