Embed Size (px)
Transcript of Teradata FAQ
1Q 2002 FAQs Q1 What is the difference between a Teradata macro and a stored procedure? Which should : I use when? A1 Although both a macro and a stored procedure can, in Teradata, be used for some of the same : thingse.g., they both support parameterized SQLthey are really very different. A macro has very restricted control logic that is limited to performing or not performing ABORT statements, which undo changes. A stored procedure, on the other hand, supports extensive procedural logic and more sophisticated error handling. It can also perform looping logic or pass along statement values during a procedure. However, macros can return a multirow answer set, while stored procedures can only send back a single set of values through the parameter list. Under some conditions, such as the straightforward insert of multiple rows, a macro will perform better than a stored procedure because it bundles all statements into one multistatement requesa single parsing-and-recovery unit. Stored procedures do not support multistatement requests, and each insert within a procedure is sent individually to the parser and dispatched as a separate request, eliminating the potential for parallel single-row inserts. For simple activities, particularly those involving multiple, repetitious SQL or requiring multirow answer sets, use macros; where the required logic is beyond what a macro can accommodate, turn to stored procedures.
Q2 How is data redistributed when I add new hardware to an existing Teradata : configuration? A2 A reconfiguration utility is provided with the DBMS to enable you to add new hardware with : minimal intervention and system unavailability. Hash buckets are assigned evenly across all parallel units (AMPs) within the Teradata system and are used by the file system to target data row placement so a table's data is evenly distributed across all AMPs. When new nodes (and with them new AMPs) are added, a subset of these hash buckets is reassigned in order to include the new nodes in the even spread of data. Data rows associated with the reassigned hash buckets are migrated onto the new AMPs on the new nodes. This utility only moves the proportion of data equivalent to the percentage of increase in number of AMPs, while leaving the majority of the rows untouched. All tables and indexes are automatically relocated over the internal high speed interconnect, in parallel on all nodes simultaneously, by issuing just a single command. When the reconfigure is complete, the entire system, including all tables and indexes, is immediately and fully available, without the need to rewrite load scripts or make any changes to jobs, queries or database parameters.
Q3 Can Teradata support star schemas? : A3 Teradata has handled "star" and "snowflake" joins for a long time. Often, the optimizer will : build a plan that initiates a product join among several dimension tables after qualifying them and then redistributing the result so it can easily join the large fact table. Teradata customers, especially those in retail, have used this technique very effectively since 1988.
Snowflake joins occur when star tables are joined to other tables in the query. Some of those joins can reduce the cardinality of the star tables. The optimizer recognizes joins that reduce cardinality and executes them prior to performing the product join, resulting in fewer rows and, thus, a shorter merge join phase. The algorithm is sophisticated enough to recognize multiple star joins in a single query and to recognize when one of the star joins is a snowflake of another. Most Teradata optimizations for star joins rely on a large fact table with a composite primary index of columns that are the equi-join fields for the dimension tables. All of the dimension tables with a join field included in the primary index must be included in the star join. This allows the optimizer to join the dimension tables to produce the composite primary index and thus execute a high-performing merge join known as a "row hash match scan." Teradata is very good at dealing with queries that do not fit a pattern, no matter what the underlying physical model. You won't limit the type of queries you can ask by using dimensional modeling in your Teradata system. The Teradata optimizer can build an optimal plan for any query involving the tables, regardless of whether it fits the star join conventions. For this reason, and because the Teradata optimizer is good at delivering on complex requests, and additionally because many users have found it easier to add subject areas using a third normal form model, a normalized schema is often recommended. In addition, many users find it easier to add subject areas with a normalized model. This recommendation is often misinterpreted to mean that third normal form is required with Teradata, when in fact the Teradata technology is completely neutral when it comes to the data model.
Q4 What happens when one query runs faster than another other during a synchronized : scan? A4 Teradata can perform a sync scan (synchronized full table scan) if more than one query is : scanning the same table. It does this by starting the second query scan at the same data block in the table where the first query is positioned. Then the queries share the reading of data blocks going forward. When the first one finishes, the second query continues the scan until it reaches its starting point. While synchronization is encouraged, it's not forced. The optimizer selects sync scan only on tables too big to fit into the data cache. Small table blocks are likely to be cached, and therefore their I/Os are already potentially shared. Once a sync scan begins, the DBMS makes a note of it and periodically checks the progress of all participants. It keeps an eye on how many tables are involved in the scan and how much memory is available to support the activity, and it monitors each query's progress. When synchronization is no longer optimal, the DBMS will spin off one or more of the participants, preventing fast queries from being hampered by slow queries.
Q5 When does Teradata reuse spool files within the same request? : A5 The Teradata optimizer can generate a query plan that will produce a spool file or temporary : work file as output in one step for use as input in multiple subsequent steps. This can save significant processing time, and the optimizer will choose to do this whenever it senses that the same intermediate file will be built in different places in the plan. For instance, it will reuse
files when a subquery accesses the same tables with the same selection criteria as the outer query, even when the subquery goes on to aggregate the resulting data and the outer query does not. Correlated subqueries often benefit from reusable spools, as do multistatement requests containing multiple insert/select statements that all select from the same source. Teradata will select from the source table once into a re-usable spool file, which will then feed inserts to the several tables. Under some conditions, table triggers will result in a plan where reusable spool files maintain both the base table and a trigger-specified table. When tables containing multiple join indexes are updated in a batch mode, e.g., by means of an insert/select operation, the prejoining happens once and the results are held in a reusable spool file, which is then applied to each of the participating join indexes. This eliminates the need to do the same prejoin activity for each join index structure requiring maintenance.
2Q 2002 FAQs Q1 I am the Medical Director for a large (2.4 million member) healthplan insurer, and we : are taking a serious look at an integrated enterprise decision support system that uses Teradata as the back-end warehouse and platform. I should note that although I am a physician, I also oversee the medical informatics area, and I am computer (hardware, software, database and SQL) savvy enough to be dangerous. We are currently a mainframe shop with primarily DB2 data marts, which are called warehouses, and a number of distributed and specialized data marts in Oracle residing on UNIX platforms. This is an old company, and there is a lot of vested interest amongst some areas of the company in the existing systems, which some people had direct input in developing (many years ago). My purpose in writing and my question is this. One of the "objections" I hear being raised from these areas about potentially moving to the new structure is that they cannot write SQL directly against the warehouse in a Teradata world. And, since that is true (in their view of the world) then obviously this has no value for power users and will not meet the needs of the organization. My view of the world is that we have been brought up on traditional relational databases (one can argue about DB2 filling that role) and the standard use of SQL query tools. We (including I) do not yet know enough about star schema architecture and how to understand and access the data model. But, I refuse to believe that it is not possible to write queries directly against the warehouse. Healthcare is certainly not unique in its need to access data in many different and powerful ways. So, can one write SQL directly against a Teradata warehouse? If not, how does one go directly against the data, bypassing the GUI front ends that power users hate in order to do complex queries in this environment? A1 If this is the only issue then we better put your system on the truck and ship it to you! : Seriously though, this is exactly what we do. Our whole philosophy is to integrate all the data in the organization into a single enterprise viewa logical data model that represents the enterprise. We enable access for all users who have a question, regardless of scope or size. All of those questions are asked directly against the enterprise copy of the data, not against