DB2 SQL Tuning Best Practices

8/12/2019 DB2 SQL Tuning Best Practices

1/22

DBA BEST PRACTICES

DB2 UDB LUW

SQL TUNING

FEBRUARY 2010


2/22

2010 Computer Sciences Corporation. 2

TABLE OF CONTENTS

1.0 Overview 4

2.0 Introduction 4

3.0 UDB DB2 Database Manager Background 5

4.0 Assumptions 7

5.0 Best Practices 7

5.1 Best Practices for Database Configuration 7

5.1.1 Database Optimization Class Registry Setting 7

5.1.2 Database Manager Instance Configuration File

Parameters 85.1.3 Database Configuration File Parameters 9

5.1.4 Database Bufferpool and Tablespace Configuration 10

5.2 Database Table and Index Best Practices 11

5.2.1 Database Table and Index Design 11

5.3 UDB DB2 Database RUNSTATS 12

5.3.1 RUNSTATS Command 13

5.4 UDB DB2 Database Table Reorganization 14

5.4.1 REORGANIZE and REORGCHK Commands 14

5.5 SQL Workload Tuning Best Practices 155.5.1 Prioritize then Divide and Conquer 15

5.5.2 Get Baseline Run Times and EXPLAIN Plans 15

5.5.3 Best Practice Coding Techniques 15

5.5.4 Review Joins and Indexes 17

5.5.5 Review All Selected Columns and Table Indexes 17

5.5.6 Retest the Entire Work Load After SQL Performance

Tuning 17

5.5.7 DB2 Index Advisor 18

db2advis - DB2 design advisor command 185.6 Explain Tools 19

5.6.1 Visual Explain Tool 19

Visual Explain 19

5.6.2 DB2expln Facility 20
http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207


3/22

SQL and XQuery explain tool 20

6.0 Appendix 21
http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212


4/22

BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY


1.0 Overview

The intent of this document is to describe the best practices for SQL Tuning for DB2 Databasesin the LUW environments. The document covers:

Database Maintenance for Best Practices

Database Configuration for Best Performance

Database Design Issues for Best Performance

SQL Coding for Best Practices

SQL Explain tools for Tuning for Performance

Version Revision Date Revised By Revision Summary

1 02/02/2010 Bruce Woodcraft Initial draft

2.0 Introduction

This document describes best practices for writing Structured Query Language (SQL) scripts

which retrieve data from an IBM DB2 database running on a Linux, UNIX, or Windows (LUW)server. It covers the best practices for writing SQL, reviewing database maintenance that affects

data retrieval, database configuration parameters that impact performance, database object design

issues for tables and indexes, and using the explain tools to assist in performance tuningactivities.

SQL Query Tuning Factors can be broken down into several categories:

Database Configuration

Database Object Maintenance

Database Object Design (Tables and Indexes)

SQL Coding Techniques

DB2 Explain Plan Tools

There are many factors that determine the performance of a given SQL query, and many of

which are beyond the control of the SQL query developer. For instance, there are database

configuration parameter settings and table maintenance activities that the DBA controls, but; the

SQL developer most likely does not have access to change or modify.It has been widely documented in the database tuning annals that the SQL query script is thesingle largest performance factor in more than three out of four cases. For this reason this

document will have the greatest focus on SQL coding techniques for performance. The other

contributing factors will be discussed but in far less detail as their remedies are detailed in otherdocuments and are beyond the scope of this document.


5/22



3.0 UDB DB2 Database Manager Background

Before discussing these SQL tuning factors, we first should consider some background on IBMs

Universal DB2 Database Manager for LUW environments. The most import component of the

product relevant to running queries to retrieve data is the Optimizer. The optimizer for anyRelational Database Management System (RDBMs) provides the intelligence for determining the

best steps for accessing and retrieving the data needed to satisfy the query. This set of database

tasks is known as the Optimized Access Path. Thus the Optimizer determines how queries willbe performed within the database and is the distinguishing component among RDBMs.

Below is a brief description of DB2s Optimizer from anIBM Technical article titled Coding

DB2 SQL for Perforance: The Basics.

http://www.ibm.com/developerworks/data/library/techarticle/0210mullins/0210mullins.html#author

The Optimizer

The optimizer is the heart and soul of DB2. It analyzes SQL statements and determines the most

efficient access path available for satisfying each statement (see Figure 1). DB2 UDB accomplishes thisby parsing the SQL statement to determine which tables and columns must be accessed. The DB2

optimizer then queries system information and statistics stored in the DB2 system catalog to determine

the best method of accomplishing the tasks necessary to satisfy the SQL request.

Figure 1. DB2 optimization in action.


6/22



The optimizer is equivalent in function to an expert system. An expert system is a set of standard rules

that, when combined with situational data, returns an "expert" opinion. For example, a medical expertsystem takes the set of rules determining which medication is useful for which illness, combines it

with data describing the symptoms of ailments, and applies that knowledge base to a list of input

symptoms. The DB2 optimizer renders expert opinions on data retrieval methods based on thesituational data housed in DB2's system catalog and a query input in SQL format.

The notion of optimizing data access in the DBMS is one of the most powerful capabilities of DB2.Remember, you access DB2 data by telling DB2 what to retrieve, not how to retrieve it. Regardless of

how the data is physically stored and manipulated, DB2 and SQL can still access that data. This

separation of access criteria from physical storage characteristics is called physical data independence.

DB2's optimizer is the component that accomplishes this physical data independence.

If you remove the indexes, DB2 can still access the data (although less efficiently). If you add a

column to the table being accessed, DB2 can still manipulate the data without changing the programcode. This is all possible because the physical access paths to DB2 data are not coded by programmers

in application programs, but are generated by DB2.

Compare this with non-DBMS systems in which the programmer must know the physical structure of

the data. If there is an index, the programmer must write appropriate code to use the index. If someone

removes the index, the program will not work unless the programmer makes changes. Not so withDB2 and SQL. All this flexibility is attributable to DB2's capability to optimize data manipulation

requests automatically.

The optimizer performs complex calculations based on a host of information. To visualize how the

optimizer works, picture the optimizer as performing a four-step process:

1. Receive and verify the syntax of the SQL statement.2. Analyze the environment and optimize the method of satisfying the SQL statement.3. Create machine-readable instructions to execute the optimized SQL.4. Execute the instructions or store them for future execution.

The second step of this process is the most intriguing. How does the optimizer decide how to execute

the vast array of SQL statements that you can send its way?

The optimizer has many types of strategies for optimizing SQL. How does it choose which of thesestrategies to use in the optimized access paths? IBM does not publish the actual, in-depth details of

how the optimizer determines the best access path, but the optimizer is a cost-basedoptimizer. Thismeans the optimizer will always attempt to formulate an access path for each query that reducesoverall cost. To accomplish this, the DB2 optimizer applies query cost formulas that evaluate and

weigh four factors for each potential access path: the CPU cost, the I/O cost, statistical information in

the DB2 system catalog, and the actual SQL statement.


7/22



4.0 AssumptionsThis document assumes the target audience has some experience and knowledge of SQL query

scripting with some relational database and points out specific best practices for using IBMs

UDB DB2 Database product for Linux, UNIX, and Windows (LUW) environments. Also, theUDB DB2 instance and database parameter configure is beyond the discussion for this paper;

but, are as they the briefly mention below that these settings have an important role in the overalloptimization of performance.

5.0 Best Practices

5.1 Best Practices for Database Configuration

This section describes some UDB DB2 system and database configuration parameters

that can be changed by a DBA which could have the greatest impact on SQL queryperformance. These are examples of Other System Information in the Optimizer

figure 1 above. These parameters are mentioned here but are covered in more detail in

the Best Practices for Database Design for UDB DB2. CAUTIONOnly the DBAshould consider tuning of these settings as they will impact all database activity, so the

upmost level of caution is needed

5.1.1 DATABASE OPTIMIZATION CLASS REGISTRY SETTING

Changing the setting of the Optimization Class registry variable can provide some of the

advantages of explicitly specifying optimization techniques, especially for the following

cases:

To manage very small databases or very simple dynamic queries To accommodate memory limitations at compile time on your database server

To reduce the query compilation time, such as PREPARE

A query optimization classis a set of query rewrite rules and optimization techniques for

compiling queries. Per IBM s UDB Information Center for LUW on this subject:

To set the query optimization for dynamic SQL, enter the following command in the

command line processor: SET CURRENT QUERY OPTIMIZATION = n;

Most statements can be adequately optimized with a reasonable amount of resources by

using optimization class 5, which is the default query optimization class. At a given

optimization class, the query compilation time and resource consumption is primarily

influenced by the complexity of the query, particularly the number of joins and subqueries.However, compilation time and resource usage are also affected by the amount of

optimization performed.

Query optimization classes 1, 2, 3, 5, and 7 are all suitable for general-purpose use. Consider

class 0 only if you require further reductions in query compilation time and you know that

the SQL statements are extremely simple.


8/22



Again, CAUTIONshould be used when changing this setting. More information and a

complete discussion of this setting can be found in the IBM UDB Information Center for

LUW. http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp

5.1.2 DATABASE MANAGER INSTANCE CONFIGURATION FILE PARAMETERS

Each UDB DB2 Instance has an Instance Configuration file that contains 68 parameters.

There are a few that have a significant impact on performance which are listed below.

Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines

http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsphttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsphttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsp


9/22



These parameters should be tuned by the database support DBA with CAUTION.

For further detail on these parameters see the source document.

5.1.3 DATABASE CONFIGURATION FILE PARAMETERS

Each UDB DB2 database has its own Database Configuration File which contains 82

different parameters. Below are the parameters that could have the greatest performanceimpact. Again use caution when changing any UDB DB2 parameter.


10/22



Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines

http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf

Like the DB2 instance setting that can be turned, there are many DB2 Databaseconfigurations settings that can have a significant effect on performance of the database.

Several key settings are: AVG_APPLSwhich the Optimizer uses to estimate how much

buffer pool memory each which will get, CATALOGCACHE_SZwhich determines how

much memory is used to catalog the system catalog, and SORTHEAPwhich specifiesamount of memory to be available for each sort operation. The details of tuning these

parameters are discussed in detail in the IBM Redbook referenced above and under the

UDB DB2 Database Tuning Best Practices and IBMs UDB DB2 Administration manual.

5.1.4 DATABASE BUFFERPOOL AND TABLESPACE CONFIGURATION

In any database design and configuration, the size and allocation of the databases

bufferpools and table spaces have the most impact factor for improving the databases

performance. Buffer pools are used to cache data in memory for reading and writing todisk, and they handle the data much faster from memory than from disks. Generally,

there just a few of different page sizes to handle the different table space page sizes.

Special purpose buffer pools may be created for specific data and processing methods.

Likewise there are many sizes of tablespaces and specific purpose tablespaces. For

instance, Temporary Tablespaces are created and assigned to specific buffer pools. UDB

DB2 has options for partitioning large tables into multiple tablespaces for data separationand faster I/O performance. Specific data that is used frequently can be set up in its own

bufferpool and tablespace so it can stay in memory for fast access. In tuning queries you

may come across often-used data that may be separated out and tuned in this fashion.

Tablespace changes, and even to a lesser extent bufferpools changes, may be needed tooptimize a given query workload and would be the responsibility of a DBA and not a

developer.

Remember, database configuration changes like the one mentioned above need to be

made with CAUTIONas they could be counterproductive to other queries in the

workload, especially if one bufferpool is reduced to create another. Its for this reasonworkloads need to be tuned as a group and measured as a group after individually looking

at the slow performers and the most often run queries. (Do not underestimate the

improvement that can be made to the overall runtime of a work load for a small query

that is run a million times.)
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdf


11/22



5.2 Database Table and Index Best Practices

Tables organize and group the data that fills the database while indexes provide maps to

specific data in the tables and speeds the I/O processing. The importance of good designand planning here will immediately impact the databases performance.

5.2.1 DATABASE TABLE AND INDEX DESIGN

Two other key elements of an optimal performing database are the design and function ofthe tables and indexes that were designed for it. Too often tables are collections of fields

and no thought for function and use have been put into their design. Indexes get added to

provide the tables a key but the design ends there. Tables with too many columns may be

should be split into two parts, one with the most used columns and one with the least usedcolumns. Some tables that are constantly joined to another table may be joined for

operational efficiency despite not being in forth normal form. Most detail on the benefits

of good table design could be found in the UDB DB2 Database Design Best Practices.Note however that table design and structure play an important role in optimizing in the

tuning of every table that reads from it or joins to it.

UDB DB2 offers a variety of table structures to store and retrieve data for optimal

performance. There are Range-Clustered Tables (RCT), MultiDimensional Clusteringtables (MDC), and for even larger tables, Range Partitioned tables (RP) tables. These

table structures have specific indexing methods that are very beneficial when used

properly. Again see the UDB DBA Database Best Practices for more detail on thesetable structures and indexing methods.

One of the biggest factors effecting query performance is what indexes are available for

the optimizer to use. The primary role of indexes is to shorten the path of the access plan

so that the data may be retrieved as fast as possible. Indexes perform an awesome andpowerful service for the database. Sometimes creating too many indexes or adding toomany columns to a particular index will be detrimental to the entire work load, especially

when adding or updating records to that over-indexed table. Adding indexes to a table is

always a tradeoff between retrieval time and maintenance time plus storage space.Usually the retrieval time is more important and the indexing is done during a batch cycle

when no one is waiting on it to finish. Also, UDB DB2 v9.7 has new index compression

features that make indexes smaller and faster to use thus offsetting of the cost associatedwith an index on a larger table.

Most if not all tables will have an index of some kind. Generally most have a unique

index that servers as the Primary Key and is explicitly states as the Primary Key. (Note in

UDB DB2 it can be created as a CONSTRAINT and will have an index created for it.)

Rule to Remember:

Five to seven indexes per table with five to nine columns at most..


12/22



Unique Indexes can be created on tables that are other than the Primary Key ( PK) and

are referred to as Alternate Keys. For example, a sequence number (or identity column)may be added to the row to provide a sequential numeric column to use as the PK and a

group of other columns may form the natural key and can be a unique combination of

columns. Unique Indexes may Include other none indexed columns that provide adirect data source for a few table columns. This becomes an extremely effective tool

especially for large rows with lots of columns. Adding a few extra columns to the

Unique Index (or AK) permits the I/O to be limited to the index only, saving big row

reads. This technique of I/O is known as Index Only Reads and is quite efficient

compared to reading both the index and the data rows.In a Snowflake or a Hub and Spoke data model, where there are a few Fact tables

that are linked to numerous Attribute tables, the Fact table should have single column

attribute key indexes that match the indexes of the Attribute tables. UDB DB2 has aspecial join operator called the STAR JOIN which handles this type of joins and index

processing in a highly efficient way using RID processing and index ANDing. See the

IBM UDB Information Center for complete details of the STAR JOIN.

5.3 UDB DB2 Database RUNSTATS

As we seen in the Optimizer Diagram above, the UDB DB2 Database uses systemcatalog statistical data to assist the optimizer in determining the best steps to retrieval the

needed data. Below will discuss the importance of this data and the necessity for

keeping it up to date.

Rule to Remember:

Use the Primary Key on a table whenever possible, unless another indexprovides more columns and faster Access Path.


13/22



5.3.1 RUNSTATSCOMMAND

. The UDB DB2 Database uses catalog statistics and column distribution counts to assistthe optimizer determine the optimal data access path. Because the optimizer uses these

counts to estimate the costs of various steps, these statistics become critical to the

decision making process. The RUNSTATS command is used to generate fresh rowcounts and column distributions after a table has been modified in a significant way since

the last time the RUNSTATS command was run.

Rule to Remember:

Run RUNSTATS command after significant changes or a total refresh of a table.


14/22



5.4 UDB DB2 Database Table Reorganization

Another important UDB DB2 Database maintenance command is the REORGANIZE

command which rearranges the rows in a table or index while removing the deleted rows.

5.4.1 REORGANIZEAND REORGCHKCOMMANDS

UDB DB2 Enterprise Manager use the REORGCHK command to test tables to see if

they need to have the REORGANIZE command run on them.

The REORGCHK command calculates statistics on the database to determine if tables orindexes, or both, need to be reorganized or cleaned up.

Rule to Remember:

Run REORG command after significant deletions and additions to a table or index.

Rule to Remember:

Run REORGCHK command to check to see if a table or index needs to be cleaned up.


15/22



5.5 SQL Workload Tuning Best Practices

5.5.1 PRIORITIZE THEN DIVIDE AND CONQUER

In most database environments there is a large set of SQL statements that is run against

the database in any given time window. Some statements are repeated daily from on-lineapplications or report programs, others are ad hoc queries run one time by a single user.After capturing the complete set of statements, subdivide the statements by application

and user priority. Also reduce the ad hoc queries to a representative subset as it will be

impossible to optimize the database for every query, let alone ad hoc queries that mayonly be run once. Also identify queries that are run the most often as optimizing these

queries will return big savings over time. Batch report queries need to run efficiently but

may not be prioritize as high as on-line screen queries needing sub second response time.

Review and tune the queries based on their priority and use. Focus on the most import

queries and those with the most visibility.

5.5.2 GET BASELINE RUN TIMES AND EXPLAINPLANS

Once you have determined you Query Workload to tune, get baseline run times and

Explain Plans. These will be needed for comparison to measure performance

improvement during and at the end of tuning process.

5.5.3 BEST PRACTICE CODING TECHNIQUES

There are some basic SQL coding techniques to follow to insure the best performancefrom the SQL script. SQL should be written to return the exact data needed with the

minimal steps and amount of data processed. Queries need to use column and rowfiltering to quickly reduce the possible rows in the return record set. The use of indexedcolumns, simple predicates, and avoiding bad coding techniques will help the optimizer

determine the best data access path for the query. Below are a few guidelines to keep in

mind when coding and reviewing SQL scripts for optimal performance.

Keep WHERE Expressions Simple- When it comes to WHERE conditions, the simpler the

better. Try to avoid using complex expressions where the expressions prevent the optimizer

from using the catalog statistics to estimate an accurate selectivity. The expressions might

also limit the choices of access plans that can be used to apply the predicate.

Avoid Functions in JOINS - JOINS will be limited to slower Nested Joins when one of the join

predicates contains an expression or function. Also the expressions may cause the

cardinality estimates to be inaccurate and cause the optimizer to select a non-optimal path.

Avoid Expressions on JOIN Columns -Try to avoid using expressions on JOIN columns where

an index exists that would disqualify the use of the index. If possible try to rewrite the query

using indexed columns or try using the reverse operations of the expressions . Applying

expressions over columns prevents the use of index start and stop keys, leads to inaccurate


16/22



selectivity estimates, and requires extra processing at query execution time. These

expressions also prevent or hamper query rewrite optimization steps as well.

Match JOIN Column Types - Avoid mismatched JOIN values as data type mismatches

prevent the use of hash joins. Also note that if the JOIN column data type is CHAR,

GRAPHIC, DECIMAL or DECFLOAT the lengths must be the same.

Avoid Non-Equality JOINS - JOIN predicates that use comparison operators other than

equality should be avoided because the join method is limited to nested loop. Also, the

optimizer might not be able to compute an accurate selectivity estimate for the JOIN

predicate. When a non-equality JOIN cannot be avoided, be sure an appropriate index exists

on either table because the join predicates will be applied on the nested loop join inner.

Dont Use Distinct Aggregations - the DISTINCT function causes a sort of the final result set,

making it one of the more expensive sorts. Note that there are changes as of DB2 V9 where

the optimizer will look to take advantage of an index to eliminate a sort for uniqueness as it

currently does in optimizing with a GROUP BY statement today. Rewriting the SQL script

using a GROUP BY or using a Sub SELECT (or IN predicate) will usually be more efficient.

Also, avoid multiple DISTINCT aggregations [eg., SUM(distinct colx), AVG(distinct coly)] in the

same SELECT as this becomes very expensive as the optimizer rewrites the original query

into separate aggregations and SORTs, for each specifying DISTINCT keyword, and then

combines the multiple aggregations using a UNION operation.

Avoid Outer Joins Unless Necessary - The left outer join can prevent a number of

optimizations, including the use of specialized star-schema join access methods. However,

in some cases the left outer join can be automatically rewritten to an inner join by the query

optimizer depending on the other predicates in the SQL script. Use of the inner equijoin is

often more efficient so use it were possible.

Tell Optimizer How Many Rows to Expect When the result set is know or can be closely

estimated, use the OPTIMIZE FOR N ROWS clause along with FETCH FIRST N ROWS ONLY

clause. OPTIMIZE FOR N ROWS clause indicates to the optimizer that the application

intends to only retrieve N rows, but the query will return the complete result set. FETCH

FIRST N ROWS ONLY clause indicates that the query should only return N rows. OPTIMIZE

FOR N ROWS along with FETCH FIRST N ROWS ONLY, to encourage query access plans that

return rows directly from the referenced tables, without first performing a buffering

operation such as inserting into a temporary table, sorting or inserting into a hash join hash

table. NOTE, that specify OPTIMIZE FOR N ROWS to encourage query access plans that

avoid buffering operations, but retrieve all rows of the result set, could experience degraded

performance. This is because the query access plan that returns the first N rows fastestmight not be the best query access plan if the entire result set is being retrieved.

Avoid Redundant Predicates- Eliminate duplicate predicates, especially when they occur

across different tables. In some cases, the optimizer cannot detect that the predicates are

redundant. This might result in cardinality underestimation and the selection of a sub-

optimal access plan. Review SQL script for columns with same data but different column


17/22



names where the same tests are being performed. Again keep the predicates as simple as

possible and remove the same test on similar columns wherever possible.

Select Only the Columns NeededAvoid using SELECT * as you return all the columns for

each row returned. This will cause more I/O processing and slow down SORTS with

needless data. Also, dont select columns you know the value for in the SQL script which

causes more unneeded data handling. For example, SELECT A, B,C WHERE C=1958

causes column C data to be processed needlessly. Also, dont select columns for sorting or

grouping if these columns are not needed in the return data set.

Select Only the Rows NeededReducing the set of rows returned in a result set will make

the query handle less data and run faster. Use row filter predicates to limit the rows of data

being returned. When writing a SQL script with multiple predicates, determine the

predicate that will filter out the most data from the result set and place that predicate at the

start of the list. By sequencing your predicates in this manner, the subsequent predicates

will have less data to filter and process.

Use and INDEX in place of a SORTCreating an index on commonly sorted data columns

could save a SORT of the result set.

5.5.4 REVIEW JOINS AND INDEXES

Table joins should always use indexed columns whenever possible for best performance.Review the JOINS and columns used. Ideally use the Primary Key for at least one of the

tables. Using index columns in the JOINS permits the optimizer to use the column

statistics and index to determine the best access path and could reduce the I/O by usingthe index rather than the data from the table. The use of indexed columns in filtering

predicates reduces the processing required and data handling by utilizing the indexes andindex processing methods.

5.5.5 REVIEW ALL SELECTED COLUMNS AND TABLE INDEXES

Selected columns should be reviewed as well as the JOIN columns. Needed columns to

satisfy the query may be available in the index used for a table JOIN or an index used for

accessing the table. If all of the selected columns are in an index, then I/O processing canbe limited just to the index pages. This is known as Index-Only Read which is much

more efficient then reading both the index and the data table. Note, UNIQUE indexes

can have data columns INCLUDED in the index pages. This is very useful when themajority of needed columns are all ready in the index and another column or two is

needed from the data row. If the row contains many columns, having all of the needed

columns in an index becomes significantly more efficient than the alternative.

5.5.6 RETEST THE ENTIRE WORK LOAD AFTER SQLPERFORMANCE TUNING

Making index changes while tuning individual SQL statements may have unplanned

impact on other parts of a given workload. It is important to retest the entire workload

after tuning the SQL statements individually. Use the recorded baselines to compareperformance improvements. Compare the ending explain plans and estimated

TIMERONS (unit of estimated run resource costs).


18/22



5.5.7 DB2INDEX ADVISOR

DB2 has a tool to review and recommend INDEXES for a specified Query Workload.

This tool reads a file of SQL Statements and generates a list of used and recommended

indexes for that workload (or statement) as well as a list of unused indexes. The outputof this tool specifies the percent of estimated performance improvement for each new

recommended index and its expected size.

Note, this tool may recommend a list of indexes to add for a given work load orstatement. Adding indexes involves a tradeoff of storage space and processing time.

Be very cautious when adding indexes.

See the IBM DB2 Information Center for further details of this tool.

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html

db2advis - DB2 design advisor command

The DB2 Design Advisor advises users on the creation of materialized query tables (MQTs) and indexes, therepartitioning of tables, the conversion to multidimensional clustering (MDC) tables, and the deletion of unusedobjects.

The recommendations are based on one or more SQL statements provided by the user. A group of related SQLstatements is known as a workload. Users can rank the importance of each statement in a workload and specify thefrequency at which each statement in the workload is to be executed. The Design Advisor outputs a DDL CLP scriptthat includes CREATE INDEX, CREATE SUMMARY TABLE (MQT), and CREATE TABLE statements to create therecommended objects.


19/22



5.6 Explain Tools

DB2 provides two tools for generating Explain Plans for a given SQL statement. These tools are

useful for reviewing and tuning queries as they identify which indexes are being used and wheretable scans are being performed.

5.6.1 VISUAL EXPLAIN TOOL

This tool is available from the DB2 Control Center and will display graphically theExplain Plan for the SQL statement specified.


http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004

770.htmlVisual Explain

Visual Explain lets you view the access plan for explained SQL or XQuery statements as a graph. You can use theinformation available from the graph to tune your queries for better performance.

Important:Access to Visual Explain through the Control Center tools has been deprecated in Version 9.7 and might be

removed in a future release. For more information, seeControl Center tools have been deprecated.Accessing Visual

Explain functionality through the Data Studio toolset has not been deprecated.

You can use Visual Explain to:

View the statistics that were used at the time of optimization. You can then compare these statistics to thecurrent catalog statistics to help you determine whether rebinding the package might improve performance.

Determine whether or not an index was used to access a table. If an index was not used, Visual Explain canhelp you determine which columns might benefit from being indexed.

View the effects of performing various tuning techniques by comparing the before and after versions of the

access plan graph for a query. Obtain information about each operation in the access plan, including the total estimated cost and number of

rows retrieved (cardinality).

An access plangraph shows details of:

Tables (and their associated columns) and indexes

Operators (such as table scans, sorts, and joins)

Table spaces and functions.

Note:Note that Visual Explain cannot be invoked from the command line, but only from various database objects in the

Control Center.

To start VisualExplain:

From the Control Center, right-click a database name and select either Show Explained Statements Historyor Explain Query.

From the Command Editor, execute an explainable statement on the Interactive page or the Script page.

From the Query Patroller, click Show Access Planfrom either the Managed Queries Properties notebook orfrom the Historical Queries Properties notebook.
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.html


20/22



5.6.2 DB2EXPLN FACILITY

DB2 comes with a operating system level command to generate the Explain Plan for a

given SQL statement.


http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html

SQL and XQuery explain tool

The db2expln command describes the access plan selected for SQL or XQuery statements.

You can use this tool to obtain a quick explanation of the chosen access plan when explain data was not captured.

For static SQL and XQuery statements, db2expln examines the packages that are stored in the system catalog. Fordynamic SQL and XQuery statements, db2expln examines the sections in the query cache.

The explain tool is located in the bin subdirectory of your instance sqllib directory. If db2expln is not in your currentdirectory, it must be in a directory that appears in your PATH environment variable.

The db2expln command uses the db2expln.bnd, db2exsrv.bnd, and db2exdyn.bnd files to bind itself to a database thefirst time the database is accessed.

Description of db2explnoutputExplain output from the db2expln command includes both package information and section information foreach package.

Parent topic:Explain facility

Related reference

db2expln- SQL and XQuery Explain command
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.html


21/22



6.0 Appendix


22/22

DB2 SQL Tuning Best Practices

Documents

Transcript of DB2 SQL Tuning Best Practices