DB2 SQL Tuning Best Practices
Transcript of DB2 SQL Tuning Best Practices
-
8/12/2019 DB2 SQL Tuning Best Practices
1/22
DBA BEST PRACTICES
DB2 UDB LUW
SQL TUNING
FEBRUARY 2010
-
8/12/2019 DB2 SQL Tuning Best Practices
2/22
2010 Computer Sciences Corporation. 2
TABLE OF CONTENTS
1.0 Overview 4
2.0 Introduction 4
3.0 UDB DB2 Database Manager Background 5
4.0 Assumptions 7
5.0 Best Practices 7
5.1 Best Practices for Database Configuration 7
5.1.1 Database Optimization Class Registry Setting 7
5.1.2 Database Manager Instance Configuration File
Parameters 85.1.3 Database Configuration File Parameters 9
5.1.4 Database Bufferpool and Tablespace Configuration 10
5.2 Database Table and Index Best Practices 11
5.2.1 Database Table and Index Design 11
5.3 UDB DB2 Database RUNSTATS 12
5.3.1 RUNSTATS Command 13
5.4 UDB DB2 Database Table Reorganization 14
5.4.1 REORGANIZE and REORGCHK Commands 14
5.5 SQL Workload Tuning Best Practices 155.5.1 Prioritize then Divide and Conquer 15
5.5.2 Get Baseline Run Times and EXPLAIN Plans 15
5.5.3 Best Practice Coding Techniques 15
5.5.4 Review Joins and Indexes 17
5.5.5 Review All Selected Columns and Table Indexes 17
5.5.6 Retest the Entire Work Load After SQL Performance
Tuning 17
5.5.7 DB2 Index Advisor 18
db2advis - DB2 design advisor command 185.6 Explain Tools 19
5.6.1 Visual Explain Tool 19
Visual Explain 19
5.6.2 DB2expln Facility 20
http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207 -
8/12/2019 DB2 SQL Tuning Best Practices
3/22
SQL and XQuery explain tool 20
6.0 Appendix 21
http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212 -
8/12/2019 DB2 SQL Tuning Best Practices
4/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 4
1.0 Overview
The intent of this document is to describe the best practices for SQL Tuning for DB2 Databasesin the LUW environments. The document covers:
Database Maintenance for Best Practices
Database Configuration for Best Performance
Database Design Issues for Best Performance
SQL Coding for Best Practices
SQL Explain tools for Tuning for Performance
Version Revision Date Revised By Revision Summary
1 02/02/2010 Bruce Woodcraft Initial draft
2.0 Introduction
This document describes best practices for writing Structured Query Language (SQL) scripts
which retrieve data from an IBM DB2 database running on a Linux, UNIX, or Windows (LUW)server. It covers the best practices for writing SQL, reviewing database maintenance that affects
data retrieval, database configuration parameters that impact performance, database object design
issues for tables and indexes, and using the explain tools to assist in performance tuningactivities.
SQL Query Tuning Factors can be broken down into several categories:
Database Configuration
Database Object Maintenance
Database Object Design (Tables and Indexes)
SQL Coding Techniques
DB2 Explain Plan Tools
There are many factors that determine the performance of a given SQL query, and many of
which are beyond the control of the SQL query developer. For instance, there are database
configuration parameter settings and table maintenance activities that the DBA controls, but; the
SQL developer most likely does not have access to change or modify.It has been widely documented in the database tuning annals that the SQL query script is thesingle largest performance factor in more than three out of four cases. For this reason this
document will have the greatest focus on SQL coding techniques for performance. The other
contributing factors will be discussed but in far less detail as their remedies are detailed in otherdocuments and are beyond the scope of this document.
-
8/12/2019 DB2 SQL Tuning Best Practices
5/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 5
3.0 UDB DB2 Database Manager Background
Before discussing these SQL tuning factors, we first should consider some background on IBMs
Universal DB2 Database Manager for LUW environments. The most import component of the
product relevant to running queries to retrieve data is the Optimizer. The optimizer for anyRelational Database Management System (RDBMs) provides the intelligence for determining the
best steps for accessing and retrieving the data needed to satisfy the query. This set of database
tasks is known as the Optimized Access Path. Thus the Optimizer determines how queries willbe performed within the database and is the distinguishing component among RDBMs.
Below is a brief description of DB2s Optimizer from anIBM Technical article titled Coding
DB2 SQL for Perforance: The Basics.
http://www.ibm.com/developerworks/data/library/techarticle/0210mullins/0210mullins.html#author
The Optimizer
The optimizer is the heart and soul of DB2. It analyzes SQL statements and determines the most
efficient access path available for satisfying each statement (see Figure 1). DB2 UDB accomplishes thisby parsing the SQL statement to determine which tables and columns must be accessed. The DB2
optimizer then queries system information and statistics stored in the DB2 system catalog to determine
the best method of accomplishing the tasks necessary to satisfy the SQL request.
Figure 1. DB2 optimization in action.
-
8/12/2019 DB2 SQL Tuning Best Practices
6/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 6
The optimizer is equivalent in function to an expert system. An expert system is a set of standard rules
that, when combined with situational data, returns an "expert" opinion. For example, a medical expertsystem takes the set of rules determining which medication is useful for which illness, combines it
with data describing the symptoms of ailments, and applies that knowledge base to a list of input
symptoms. The DB2 optimizer renders expert opinions on data retrieval methods based on thesituational data housed in DB2's system catalog and a query input in SQL format.
The notion of optimizing data access in the DBMS is one of the most powerful capabilities of DB2.Remember, you access DB2 data by telling DB2 what to retrieve, not how to retrieve it. Regardless of
how the data is physically stored and manipulated, DB2 and SQL can still access that data. This
separation of access criteria from physical storage characteristics is called physical data independence.
DB2's optimizer is the component that accomplishes this physical data independence.
If you remove the indexes, DB2 can still access the data (although less efficiently). If you add a
column to the table being accessed, DB2 can still manipulate the data without changing the programcode. This is all possible because the physical access paths to DB2 data are not coded by programmers
in application programs, but are generated by DB2.
Compare this with non-DBMS systems in which the programmer must know the physical structure of
the data. If there is an index, the programmer must write appropriate code to use the index. If someone
removes the index, the program will not work unless the programmer makes changes. Not so withDB2 and SQL. All this flexibility is attributable to DB2's capability to optimize data manipulation
requests automatically.
The optimizer performs complex calculations based on a host of information. To visualize how the
optimizer works, picture the optimizer as performing a four-step process:
1. Receive and verify the syntax of the SQL statement.2. Analyze the environment and optimize the method of satisfying the SQL statement.3. Create machine-readable instructions to execute the optimized SQL.4. Execute the instructions or store them for future execution.
The second step of this process is the most intriguing. How does the optimizer decide how to execute
the vast array of SQL statements that you can send its way?
The optimizer has many types of strategies for optimizing SQL. How does it choose which of thesestrategies to use in the optimized access paths? IBM does not publish the actual, in-depth details of
how the optimizer determines the best access path, but the optimizer is a cost-basedoptimizer. Thismeans the optimizer will always attempt to formulate an access path for each query that reducesoverall cost. To accomplish this, the DB2 optimizer applies query cost formulas that evaluate and
weigh four factors for each potential access path: the CPU cost, the I/O cost, statistical information in
the DB2 system catalog, and the actual SQL statement.
-
8/12/2019 DB2 SQL Tuning Best Practices
7/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 7
4.0 AssumptionsThis document assumes the target audience has some experience and knowledge of SQL query
scripting with some relational database and points out specific best practices for using IBMs
UDB DB2 Database product for Linux, UNIX, and Windows (LUW) environments. Also, theUDB DB2 instance and database parameter configure is beyond the discussion for this paper;
but, are as they the briefly mention below that these settings have an important role in the overalloptimization of performance.
5.0 Best Practices
5.1 Best Practices for Database Configuration
This section describes some UDB DB2 system and database configuration parameters
that can be changed by a DBA which could have the greatest impact on SQL queryperformance. These are examples of Other System Information in the Optimizer
figure 1 above. These parameters are mentioned here but are covered in more detail in
the Best Practices for Database Design for UDB DB2. CAUTIONOnly the DBAshould consider tuning of these settings as they will impact all database activity, so the
upmost level of caution is needed
5.1.1 DATABASE OPTIMIZATION CLASS REGISTRY SETTING
Changing the setting of the Optimization Class registry variable can provide some of the
advantages of explicitly specifying optimization techniques, especially for the following
cases:
To manage very small databases or very simple dynamic queries To accommodate memory limitations at compile time on your database server
To reduce the query compilation time, such as PREPARE
A query optimization classis a set of query rewrite rules and optimization techniques for
compiling queries. Per IBM s UDB Information Center for LUW on this subject:
To set the query optimization for dynamic SQL, enter the following command in the
command line processor: SET CURRENT QUERY OPTIMIZATION = n;
Most statements can be adequately optimized with a reasonable amount of resources by
using optimization class 5, which is the default query optimization class. At a given
optimization class, the query compilation time and resource consumption is primarily
influenced by the complexity of the query, particularly the number of joins and subqueries.However, compilation time and resource usage are also affected by the amount of
optimization performed.
Query optimization classes 1, 2, 3, 5, and 7 are all suitable for general-purpose use. Consider
class 0 only if you require further reductions in query compilation time and you know that
the SQL statements are extremely simple.
-
8/12/2019 DB2 SQL Tuning Best Practices
8/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 8
Again, CAUTIONshould be used when changing this setting. More information and a
complete discussion of this setting can be found in the IBM UDB Information Center for
LUW. http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp
5.1.2 DATABASE MANAGER INSTANCE CONFIGURATION FILE PARAMETERS
Each UDB DB2 Instance has an Instance Configuration file that contains 68 parameters.
There are a few that have a significant impact on performance which are listed below.
Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines
http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsphttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsphttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsp -
8/12/2019 DB2 SQL Tuning Best Practices
9/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 9
These parameters should be tuned by the database support DBA with CAUTION.
For further detail on these parameters see the source document.
5.1.3 DATABASE CONFIGURATION FILE PARAMETERS
Each UDB DB2 database has its own Database Configuration File which contains 82
different parameters. Below are the parameters that could have the greatest performanceimpact. Again use caution when changing any UDB DB2 parameter.
-
8/12/2019 DB2 SQL Tuning Best Practices
10/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 10
Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines
http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf
Like the DB2 instance setting that can be turned, there are many DB2 Databaseconfigurations settings that can have a significant effect on performance of the database.
Several key settings are: AVG_APPLSwhich the Optimizer uses to estimate how much
buffer pool memory each which will get, CATALOGCACHE_SZwhich determines how
much memory is used to catalog the system catalog, and SORTHEAPwhich specifiesamount of memory to be available for each sort operation. The details of tuning these
parameters are discussed in detail in the IBM Redbook referenced above and under the
UDB DB2 Database Tuning Best Practices and IBMs UDB DB2 Administration manual.
5.1.4 DATABASE BUFFERPOOL AND TABLESPACE CONFIGURATION
In any database design and configuration, the size and allocation of the databases
bufferpools and table spaces have the most impact factor for improving the databases
performance. Buffer pools are used to cache data in memory for reading and writing todisk, and they handle the data much faster from memory than from disks. Generally,
there just a few of different page sizes to handle the different table space page sizes.
Special purpose buffer pools may be created for specific data and processing methods.
Likewise there are many sizes of tablespaces and specific purpose tablespaces. For
instance, Temporary Tablespaces are created and assigned to specific buffer pools. UDB
DB2 has options for partitioning large tables into multiple tablespaces for data separationand faster I/O performance. Specific data that is used frequently can be set up in its own
bufferpool and tablespace so it can stay in memory for fast access. In tuning queries you
may come across often-used data that may be separated out and tuned in this fashion.
Tablespace changes, and even to a lesser extent bufferpools changes, may be needed tooptimize a given query workload and would be the responsibility of a DBA and not a
developer.
Remember, database configuration changes like the one mentioned above need to be
made with CAUTIONas they could be counterproductive to other queries in the
workload, especially if one bufferpool is reduced to create another. Its for this reasonworkloads need to be tuned as a group and measured as a group after individually looking
at the slow performers and the most often run queries. (Do not underestimate the
improvement that can be made to the overall runtime of a work load for a small query
that is run a million times.)
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdf -
8/12/2019 DB2 SQL Tuning Best Practices
11/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 11
5.2 Database Table and Index Best Practices
Tables organize and group the data that fills the database while indexes provide maps to
specific data in the tables and speeds the I/O processing. The importance of good designand planning here will immediately impact the databases performance.
5.2.1 DATABASE TABLE AND INDEX DESIGN
Two other key elements of an optimal performing database are the design and function ofthe tables and indexes that were designed for it. Too often tables are collections of fields
and no thought for function and use have been put into their design. Indexes get added to
provide the tables a key but the design ends there. Tables with too many columns may be
should be split into two parts, one with the most used columns and one with the least usedcolumns. Some tables that are constantly joined to another table may be joined for
operational efficiency despite not being in forth normal form. Most detail on the benefits
of good table design could be found in the UDB DB2 Database Design Best Practices.Note however that table design and structure play an important role in optimizing in the
tuning of every table that reads from it or joins to it.
UDB DB2 offers a variety of table structures to store and retrieve data for optimal
performance. There are Range-Clustered Tables (RCT), MultiDimensional Clusteringtables (MDC), and for even larger tables, Range Partitioned tables (RP) tables. These
table structures have specific indexing methods that are very beneficial when used
properly. Again see the UDB DBA Database Best Practices for more detail on thesetable structures and indexing methods.
One of the biggest factors effecting query performance is what indexes are available for
the optimizer to use. The primary role of indexes is to shorten the path of the access plan
so that the data may be retrieved as fast as possible. Indexes perform an awesome andpowerful service for the database. Sometimes creating too many indexes or adding toomany columns to a particular index will be detrimental to the entire work load, especially
when adding or updating records to that over-indexed table. Adding indexes to a table is
always a tradeoff between retrieval time and maintenance time plus storage space.Usually the retrieval time is more important and the indexing is done during a batch cycle
when no one is waiting on it to finish. Also, UDB DB2 v9.7 has new index compression
features that make indexes smaller and faster to use thus offsetting of the cost associatedwith an index on a larger table.
Most if not all tables will have an index of some kind. Generally most have a unique
index that servers as the Primary Key and is explicitly states as the Primary Key. (Note in
UDB DB2 it can be created as a CONSTRAINT and will have an index created for it.)
Rule to Remember:
Five to seven indexes per table with five to nine columns at most..
-
8/12/2019 DB2 SQL Tuning Best Practices
12/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 12
Unique Indexes can be created on tables that are other than the Primary Key ( PK) and
are referred to as Alternate Keys. For example, a sequence number (or identity column)may be added to the row to provide a sequential numeric column to use as the PK and a
group of other columns may form the natural key and can be a unique combination of
columns. Unique Indexes may Include other none indexed columns that provide adirect data source for a few table columns. This becomes an extremely effective tool
especially for large rows with lots of columns. Adding a few extra columns to the
Unique Index (or AK) permits the I/O to be limited to the index only, saving big row
reads. This technique of I/O is known as Index Only Reads and is quite efficient
compared to reading both the index and the data rows.In a Snowflake or a Hub and Spoke data model, where there are a few Fact tables
that are linked to numerous Attribute tables, the Fact table should have single column
attribute key indexes that match the indexes of the Attribute tables. UDB DB2 has aspecial join operator called the STAR JOIN which handles this type of joins and index
processing in a highly efficient way using RID processing and index ANDing. See the
IBM UDB Information Center for complete details of the STAR JOIN.
5.3 UDB DB2 Database RUNSTATS
As we seen in the Optimizer Diagram above, the UDB DB2 Database uses systemcatalog statistical data to assist the optimizer in determining the best steps to retrieval the
needed data. Below will discuss the importance of this data and the necessity for
keeping it up to date.
Rule to Remember:
Use the Primary Key on a table whenever possible, unless another indexprovides more columns and faster Access Path.
-
8/12/2019 DB2 SQL Tuning Best Practices
13/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 13
5.3.1 RUNSTATSCOMMAND
. The UDB DB2 Database uses catalog statistics and column distribution counts to assistthe optimizer determine the optimal data access path. Because the optimizer uses these
counts to estimate the costs of various steps, these statistics become critical to the
decision making process. The RUNSTATS command is used to generate fresh rowcounts and column distributions after a table has been modified in a significant way since
the last time the RUNSTATS command was run.
Rule to Remember:
Run RUNSTATS command after significant changes or a total refresh of a table.
-
8/12/2019 DB2 SQL Tuning Best Practices
14/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 14
5.4 UDB DB2 Database Table Reorganization
Another important UDB DB2 Database maintenance command is the REORGANIZE
command which rearranges the rows in a table or index while removing the deleted rows.
5.4.1 REORGANIZEAND REORGCHKCOMMANDS
UDB DB2 Enterprise Manager use the REORGCHK command to test tables to see if
they need to have the REORGANIZE command run on them.
The REORGCHK command calculates statistics on the database to determine if tables orindexes, or both, need to be reorganized or cleaned up.
Rule to Remember:
Run REORG command after significant deletions and additions to a table or index.
Rule to Remember:
Run REORGCHK command to check to see if a table or index needs to be cleaned up.
-
8/12/2019 DB2 SQL Tuning Best Practices
15/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 15
5.5 SQL Workload Tuning Best Practices
5.5.1 PRIORITIZE THEN DIVIDE AND CONQUER
In most database environments there is a large set of SQL statements that is run against
the database in any given time window. Some statements are repeated daily from on-lineapplications or report programs, others are ad hoc queries run one time by a single user.After capturing the complete set of statements, subdivide the statements by application
and user priority. Also reduce the ad hoc queries to a representative subset as it will be
impossible to optimize the database for every query, let alone ad hoc queries that mayonly be run once. Also identify queries that are run the most often as optimizing these
queries will return big savings over time. Batch report queries need to run efficiently but
may not be prioritize as high as on-line screen queries needing sub second response time.
Review and tune the queries based on their priority and use. Focus on the most import
queries and those with the most visibility.
5.5.2 GET BASELINE RUN TIMES AND EXPLAINPLANS
Once you have determined you Query Workload to tune, get baseline run times and
Explain Plans. These will be needed for comparison to measure performance
improvement during and at the end of tuning process.
5.5.3 BEST PRACTICE CODING TECHNIQUES
There are some basic SQL coding techniques to follow to insure the best performancefrom the SQL script. SQL should be written to return the exact data needed with the
minimal steps and amount of data processed. Queries need to use column and rowfiltering to quickly reduce the possible rows in the return record set. The use of indexedcolumns, simple predicates, and avoiding bad coding techniques will help the optimizer
determine the best data access path for the query. Below are a few guidelines to keep in
mind when coding and reviewing SQL scripts for optimal performance.
Keep WHERE Expressions Simple- When it comes to WHERE conditions, the simpler the
better. Try to avoid using complex expressions where the expressions prevent the optimizer
from using the catalog statistics to estimate an accurate selectivity. The expressions might
also limit the choices of access plans that can be used to apply the predicate.
Avoid Functions in JOINS - JOINS will be limited to slower Nested Joins when one of the join
predicates contains an expression or function. Also the expressions may cause the
cardinality estimates to be inaccurate and cause the optimizer to select a non-optimal path.
Avoid Expressions on JOIN Columns -Try to avoid using expressions on JOIN columns where
an index exists that would disqualify the use of the index. If possible try to rewrite the query
using indexed columns or try using the reverse operations of the expressions . Applying
expressions over columns prevents the use of index start and stop keys, leads to inaccurate
-
8/12/2019 DB2 SQL Tuning Best Practices
16/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 16
selectivity estimates, and requires extra processing at query execution time. These
expressions also prevent or hamper query rewrite optimization steps as well.
Match JOIN Column Types - Avoid mismatched JOIN values as data type mismatches
prevent the use of hash joins. Also note that if the JOIN column data type is CHAR,
GRAPHIC, DECIMAL or DECFLOAT the lengths must be the same.
Avoid Non-Equality JOINS - JOIN predicates that use comparison operators other than
equality should be avoided because the join method is limited to nested loop. Also, the
optimizer might not be able to compute an accurate selectivity estimate for the JOIN
predicate. When a non-equality JOIN cannot be avoided, be sure an appropriate index exists
on either table because the join predicates will be applied on the nested loop join inner.
Dont Use Distinct Aggregations - the DISTINCT function causes a sort of the final result set,
making it one of the more expensive sorts. Note that there are changes as of DB2 V9 where
the optimizer will look to take advantage of an index to eliminate a sort for uniqueness as it
currently does in optimizing with a GROUP BY statement today. Rewriting the SQL script
using a GROUP BY or using a Sub SELECT (or IN predicate) will usually be more efficient.
Also, avoid multiple DISTINCT aggregations [eg., SUM(distinct colx), AVG(distinct coly)] in the
same SELECT as this becomes very expensive as the optimizer rewrites the original query
into separate aggregations and SORTs, for each specifying DISTINCT keyword, and then
combines the multiple aggregations using a UNION operation.
Avoid Outer Joins Unless Necessary - The left outer join can prevent a number of
optimizations, including the use of specialized star-schema join access methods. However,
in some cases the left outer join can be automatically rewritten to an inner join by the query
optimizer depending on the other predicates in the SQL script. Use of the inner equijoin is
often more efficient so use it were possible.
Tell Optimizer How Many Rows to Expect When the result set is know or can be closely
estimated, use the OPTIMIZE FOR N ROWS clause along with FETCH FIRST N ROWS ONLY
clause. OPTIMIZE FOR N ROWS clause indicates to the optimizer that the application
intends to only retrieve N rows, but the query will return the complete result set. FETCH
FIRST N ROWS ONLY clause indicates that the query should only return N rows. OPTIMIZE
FOR N ROWS along with FETCH FIRST N ROWS ONLY, to encourage query access plans that
return rows directly from the referenced tables, without first performing a buffering
operation such as inserting into a temporary table, sorting or inserting into a hash join hash
table. NOTE, that specify OPTIMIZE FOR N ROWS to encourage query access plans that
avoid buffering operations, but retrieve all rows of the result set, could experience degraded
performance. This is because the query access plan that returns the first N rows fastestmight not be the best query access plan if the entire result set is being retrieved.
Avoid Redundant Predicates- Eliminate duplicate predicates, especially when they occur
across different tables. In some cases, the optimizer cannot detect that the predicates are
redundant. This might result in cardinality underestimation and the selection of a sub-
optimal access plan. Review SQL script for columns with same data but different column
-
8/12/2019 DB2 SQL Tuning Best Practices
17/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 17
names where the same tests are being performed. Again keep the predicates as simple as
possible and remove the same test on similar columns wherever possible.
Select Only the Columns NeededAvoid using SELECT * as you return all the columns for
each row returned. This will cause more I/O processing and slow down SORTS with
needless data. Also, dont select columns you know the value for in the SQL script which
causes more unneeded data handling. For example, SELECT A, B,C WHERE C=1958
causes column C data to be processed needlessly. Also, dont select columns for sorting or
grouping if these columns are not needed in the return data set.
Select Only the Rows NeededReducing the set of rows returned in a result set will make
the query handle less data and run faster. Use row filter predicates to limit the rows of data
being returned. When writing a SQL script with multiple predicates, determine the
predicate that will filter out the most data from the result set and place that predicate at the
start of the list. By sequencing your predicates in this manner, the subsequent predicates
will have less data to filter and process.
Use and INDEX in place of a SORTCreating an index on commonly sorted data columns
could save a SORT of the result set.
5.5.4 REVIEW JOINS AND INDEXES
Table joins should always use indexed columns whenever possible for best performance.Review the JOINS and columns used. Ideally use the Primary Key for at least one of the
tables. Using index columns in the JOINS permits the optimizer to use the column
statistics and index to determine the best access path and could reduce the I/O by usingthe index rather than the data from the table. The use of indexed columns in filtering
predicates reduces the processing required and data handling by utilizing the indexes andindex processing methods.
5.5.5 REVIEW ALL SELECTED COLUMNS AND TABLE INDEXES
Selected columns should be reviewed as well as the JOIN columns. Needed columns to
satisfy the query may be available in the index used for a table JOIN or an index used for
accessing the table. If all of the selected columns are in an index, then I/O processing canbe limited just to the index pages. This is known as Index-Only Read which is much
more efficient then reading both the index and the data table. Note, UNIQUE indexes
can have data columns INCLUDED in the index pages. This is very useful when themajority of needed columns are all ready in the index and another column or two is
needed from the data row. If the row contains many columns, having all of the needed
columns in an index becomes significantly more efficient than the alternative.
5.5.6 RETEST THE ENTIRE WORK LOAD AFTER SQLPERFORMANCE TUNING
Making index changes while tuning individual SQL statements may have unplanned
impact on other parts of a given workload. It is important to retest the entire workload
after tuning the SQL statements individually. Use the recorded baselines to compareperformance improvements. Compare the ending explain plans and estimated
TIMERONS (unit of estimated run resource costs).
-
8/12/2019 DB2 SQL Tuning Best Practices
18/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 18
5.5.7 DB2INDEX ADVISOR
DB2 has a tool to review and recommend INDEXES for a specified Query Workload.
This tool reads a file of SQL Statements and generates a list of used and recommended
indexes for that workload (or statement) as well as a list of unused indexes. The outputof this tool specifies the percent of estimated performance improvement for each new
recommended index and its expected size.
Note, this tool may recommend a list of indexes to add for a given work load orstatement. Adding indexes involves a tradeoff of storage space and processing time.
Be very cautious when adding indexes.
See the IBM DB2 Information Center for further details of this tool.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html
db2advis - DB2 design advisor command
The DB2 Design Advisor advises users on the creation of materialized query tables (MQTs) and indexes, therepartitioning of tables, the conversion to multidimensional clustering (MDC) tables, and the deletion of unusedobjects.
The recommendations are based on one or more SQL statements provided by the user. A group of related SQLstatements is known as a workload. Users can rank the importance of each statement in a workload and specify thefrequency at which each statement in the workload is to be executed. The Design Advisor outputs a DDL CLP scriptthat includes CREATE INDEX, CREATE SUMMARY TABLE (MQT), and CREATE TABLE statements to create therecommended objects.
-
8/12/2019 DB2 SQL Tuning Best Practices
19/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 19
5.6 Explain Tools
DB2 provides two tools for generating Explain Plans for a given SQL statement. These tools are
useful for reviewing and tuning queries as they identify which indexes are being used and wheretable scans are being performed.
5.6.1 VISUAL EXPLAIN TOOL
This tool is available from the DB2 Control Center and will display graphically theExplain Plan for the SQL statement specified.
See the IBM DB2 Information Center for further details of this tool.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004
770.htmlVisual Explain
Visual Explain lets you view the access plan for explained SQL or XQuery statements as a graph. You can use theinformation available from the graph to tune your queries for better performance.
Important:Access to Visual Explain through the Control Center tools has been deprecated in Version 9.7 and might be
removed in a future release. For more information, seeControl Center tools have been deprecated.Accessing Visual
Explain functionality through the Data Studio toolset has not been deprecated.
You can use Visual Explain to:
View the statistics that were used at the time of optimization. You can then compare these statistics to thecurrent catalog statistics to help you determine whether rebinding the package might improve performance.
Determine whether or not an index was used to access a table. If an index was not used, Visual Explain canhelp you determine which columns might benefit from being indexed.
View the effects of performing various tuning techniques by comparing the before and after versions of the
access plan graph for a query. Obtain information about each operation in the access plan, including the total estimated cost and number of
rows retrieved (cardinality).
An access plangraph shows details of:
Tables (and their associated columns) and indexes
Operators (such as table scans, sorts, and joins)
Table spaces and functions.
Note:Note that Visual Explain cannot be invoked from the command line, but only from various database objects in the
Control Center.
To start VisualExplain:
From the Control Center, right-click a database name and select either Show Explained Statements Historyor Explain Query.
From the Command Editor, execute an explainable statement on the Interactive page or the Script page.
From the Query Patroller, click Show Access Planfrom either the Managed Queries Properties notebook orfrom the Historical Queries Properties notebook.
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.html -
8/12/2019 DB2 SQL Tuning Best Practices
20/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 20
5.6.2 DB2EXPLN FACILITY
DB2 comes with a operating system level command to generate the Explain Plan for a
given SQL statement.
See the IBM DB2 Information Center for further details of this tool.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html
SQL and XQuery explain tool
The db2expln command describes the access plan selected for SQL or XQuery statements.
You can use this tool to obtain a quick explanation of the chosen access plan when explain data was not captured.
For static SQL and XQuery statements, db2expln examines the packages that are stored in the system catalog. Fordynamic SQL and XQuery statements, db2expln examines the sections in the query cache.
The explain tool is located in the bin subdirectory of your instance sqllib directory. If db2expln is not in your currentdirectory, it must be in a directory that appears in your PATH environment variable.
The db2expln command uses the db2expln.bnd, db2exsrv.bnd, and db2exdyn.bnd files to bind itself to a database thefirst time the database is accessed.
Description of db2explnoutputExplain output from the db2expln command includes both package information and section information foreach package.
Parent topic:Explain facility
Related reference
db2expln- SQL and XQuery Explain command
http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.html -
8/12/2019 DB2 SQL Tuning Best Practices
21/22
BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY
2008 Computer Sciences Corporation. 21
6.0 Appendix
-
8/12/2019 DB2 SQL Tuning Best Practices
22/22