IBM eServer iSeries Session: Intro to Query Optimization DB2 UDB for iSeries Tom McKinley IBM...

33
IBM eServer iSeries 8 CopyrightIBM Corporation, 2005. A llRightsReserved. Thispublication m ay referto productsthatare notcurrently available in yourcountry. IBM m akesno com m itm entto m ake available any productsreferred to herein. Session: Intro to Query Optimization DB2 UDB for iSeries Tom McKinley IBM Rochester, MN USA

Transcript of IBM eServer iSeries Session: Intro to Query Optimization DB2 UDB for iSeries Tom McKinley IBM...

IBM eServer iSeries

8 Copyright IBM Corporation, 2005. All Rights Reserved.

This publication may refer to products that are not currently available in your country. IBM makes no commitment to make available any products referred to herein.

Session:

Intro to Query OptimizationDB2 UDB for iSeries

Tom McKinley

IBM Rochester, MN USA

Background / Foundation

IBM eServer iSeries

© 2005 IBM Corporation

Three code bases...

– Based on the system history, architecture and operating system

ƒ DB2 UDB for Linux, UNIX, Windows (LUW)

ƒ DB2 UDB for z/OS (S/390)

ƒ DB2 UDB for iSeries (AS/400)

IBM's DB2 UDB Family

IBM eServer iSeries

© 2005 IBM Corporation

i5 + i5/OS

–System viewed as a database server, not just an application system

–DB2 UDB for iSeries (integrated part of OS/400 or i5/OS)

–Universal Database support

–Data Centric focus

–Business logic moving into the database engine

–SQL (DDL and DML) as primary interface to database

–GUI to operating system and database via iSeries Navigator

DB2 UDB for iSeries

IBM eServer iSeries

© 2005 IBM Corporation

i5/OS AIXLinux Windows***

*** No LPAR support

Virtual 1Gbit Ethernet LAN

LPAR-1 LPAR-2 LPAR-3 IXS/IXA

DB2UDBfor

iSeries

DB2UDBfor

Linux

DB2UDBforAIX

DB2UDBforWin

iSeries - Logical Partitioning (LPAR)

IBM eServer iSeries

© 2005 IBM Corporation

Single Level Storage

QUERY

M EMORY

IOP IOPIOPIOPIOPIOPIOP

Storage Management

Table

SingleSystem

Multiple CPUs

N-way

SMP

64 bitPOWER

iSeries i5 i5/OS Architecture

IBM eServer iSeries

© 2005 IBM Corporation

i5/OSSQLschema/collection

table

view

index

row

column

log

library

physical file

logical file

keyed logical file

record

field

journal

i5/OS Objects

IBM eServer iSeries

© 2005 IBM Corporation

Library (Schema)

Physical File (Table)

Member 1

Member 2

Member 3

SELECT... FROM Physical File

Alias_1

Alias_2

Alias_3

SELECT... FROM Alias_1

SELECT... FROM Alias_2

SELECT... FROM Alias_3

CREATE ALIAS...

i5/OS Objects

IBM eServer iSeries

© 2005 IBM Corporation

ƒ Systemƒ Library

ƒ Objectƒ Type

ƒ Attribute (subtype)ƒ System

ƒ My_Schemaƒ DB_Table

ƒ *FILEƒ PF (physical file)

ƒ Systemƒ My_Schema

ƒ DB_Indexƒ *FILE

ƒ LF (logical file)ƒ System

ƒ My_Schemaƒ DB_View

ƒ *FILEƒ LF (logical file)

CREATE TABLE My_Schema.DB_Table ...

CREATE INDEX My_Schema.DB_Index ...

Must be unique

CREATE VIEW My_Schema.DB_View ...

i5/OS Objects

IBM eServer iSeries

© 2005 IBM Corporation

DB2

DB File (PF) object

CREATE TABLE

HighLevelLanguageNative I/O

Structured Query Language (SQL)EmbeddedODBCJDBCCLI

Command Language (CL)

One Database Management System with multiple interfaces

CRTPF

SELECT... FROM...

i5/OS Objects

IBM eServer iSeries

© 2005 IBM Corporation

DB2 UDB for iSeries

SQL request Optimize RunOpen

SQL Query Processing

Query Optimization

IBM eServer iSeries

© 2005 IBM Corporation

ODBC / JDBC / ADO / DRDA / XDA

Host Server

Static Dynamic

Extended DynamicCompiled

embedded statements

Prepare every time

Prepare once and then reference

Optimizer

DB2 UDB

Native(Recor

dI/O)

SQL

Network

(Data Storage & Management)

CLI / JDBC

The optimizer and database engine are separated at different

layers of the operating system

V5R1 Database Architecture

IBM eServer iSeries

© 2005 IBM Corporation

ODBC / JDBC / ADO .NET / DRDA / XDA

Host Server

Static Dynamic

Extended DynamicCompiled

embedded statements

Prepare every time

Prepare once and then reference

Optimizer

DB2 UDB

Native(Recor

dI/O)

SQL

Network

(Data Storage & Management)

CLI / JDBC

The optimizer and database engine

merged to form the SQL Query Engine, and

much of the work was moved to SLIC

V5R2 and V5R3 Database Architecture

IBM eServer iSeries

© 2005 IBM Corporation

V5R2 and V5R3 Database Architecture

IBM eServer iSeries

© 2005 IBM Corporation

Determines which engine will optimize and process each query request

–Only SQL requests are considered for the SQL Query Engine

Initial step for all query optimization that occurs in i5/OS

Ability to “back up” and use the Classic Query Engine when non-standard indexes are encountered during optimization

Initial goal is to use SQE

The Query Dispatcher

IBM eServer iSeries

© 2005 IBM Corporation

Dispatched to CQE if:

–>1 Table (i.e. no joins)–OR & IN predicates–SMP requested–Non-Read (INSERT with subselect can use new path)–LIKE predicates–UNIONS–View or Logical File references–Subquery–Derived Tables & Common Table expressions, UDTFs–LOB columns–LOWER, TRANSLATE, or UPPER scalar function–CHARACTER_LENGTH, POSITION, or SUBSTRING scalar function using UTF-8/16–Sort Sequences & CCSID translation between columns–Distributed queries via DB2 Multisystem–Non-SQL queries (QQQQry API, Query/400, OPNQRYF)–ALWCPYDTA(*NO) specified–Sensitive Cursor

SQE support added into V5R2 - May 2003(Latest DB Group + SI07650)

Not part of any

package

The Query Dispatcher – V5R2

IBM eServer iSeries

© 2005 IBM Corporation

Dispatched to CQE if:

–LIKE predicates–Logical File references–UDTFs–LOB columns–LOWER, TRANSLATE, or UPPER scalar function–CHARACTER_LENGTH, POSITION, or SUBSTRING scalar function using UTF-8/16–Sort Sequences & CCSID translation between columns–DB2 Multisystem–Non-SQL queries (QQQQry API, Query/400, OPNQRYF)–ALWCPYDTA(*NO) specified–Sensitive Cursor

SQE now optimizes

–VIEWS, UNIONS, SubQueries–INSERT, UPDATE, DELETE–Star Schema Join queries

Only SQE optimizes

–INTERSECT–EXCEPT

The Query Dispatcher - V5R3

IBM eServer iSeries

© 2005 IBM Corporation

Back up to CQE to complete optimization if any of the following are encountered:

–Select/omit logical file–Logical file over multiple members–Join logical file–Derived key (s)

ƒ Native logical files that perform some intermediate mapping of the fields referenced in the key. Common ones are renaming fields, adding a translate or only selecting a subset of the columns

ƒ Specifying an Alternate Collating Sequence (ACS) on a field used for a key will also make a “derived key” (an implied map occurs within the index)

–Sort Sequence (NLSS) specified for index or logical fileƒ Probably the trickiest one to detect for users. The index is built while an NLSS table is specified in the query environment

–Cost to “back up” and revert to CQE adds about 15% to the total optimization time

–QAQQINI parameter to ignore unsupported logical filesƒ Ignore_Derived_Index = *YES

The Query Dispatcher

IBM eServer iSeries

© 2005 IBM Corporation

The Optimizer

Provides the recipe

Provides the methods

Does no cooking

The Optimizer

Writes the best? program to fulfill your request

Optimization

IBM eServer iSeries

© 2005 IBM Corporation

Server configurationServer attributes

Version/Release/ModificationLevel

SMP

Database design

Table sizes, number of rows

Views and Indexes (Radix, EVI)

Work management

StaticDynamic

Extended DynamicInterfaces

SQL Request

Job, Query attributes

Server performance

The Plan

Optimization... the intersection of various factors

IBM eServer iSeries

© 2005 IBM Corporation

The output of query optimization (“the recipe and methods”)

Contents

A control structure that contains information on the actions necessary to satisfy each SQL request

These contents include:

–Access Method

–Info on associated tables and indexes

–Any applicable program and/or environment information

(Query) Access Plans

IBM eServer iSeries

© 2005 IBM Corporation

Cost Based Query Optimization

The DB2 for iSeries Optimizer performs "cost based" optimization

"Cost" is defined as the estimated time it takes to run the request

"Costing" various plans refers to the comparison of a given set of algorithms and methods in an attempt to identify the "fastest" plan

Optimization is based on time, not on resource utilization

Usually the fastest plan is also the most resource efficient plan, but this is not necessarily true

The goal of the optimizer is to eliminate I/O as early as possible by identifying the best path to and through the data

The optimizer has the ability and freedom to "rewrite" the query

Query Optimization

IBM eServer iSeries

© 2005 IBM Corporation

Query processing can be divided into four phases:

Query Validation–Validate the query request–Validate existing access plan–Builds internal query structures

Query Dispatcher–Determine which query engine should complete the processing

Query Optimization–Choose most efficient access method–Builds access plan

Query Execution–Build the structures needed for query cursor–Build the structures for any temporary indexes (if needed)–Builds and activates query cursor (ODP)–Generate any feedback requested

Debug messages in the job logDB Monitor recordsVisual Explain

We can affect this...

Query Phases

IBM eServer iSeries

© 2005 IBM Corporation

Query Optimization

SQL request

DB Monitor Data

Joblog Messages

SQL Info from PGMs & PKGs

VisualExplain

SQE Plan Cache

Query Optimization Feedback

IBM eServer iSeries

© 2005 IBM Corporation

Cost based optimization dictates that the fastest access method for a given table will vary based upon selectivity of the query

Number of rows searched / accessed

Few Many

ResponseTime Method 3

Low

High

Method 1

Method 2

Data Access Methods

IBM eServer iSeries

© 2005 IBM Corporation

Query optimization will generally follow this simplified strategy:

Gather meta-data and statistics for costingSelectivity statisticsIndexes available to be costed

Sort the indexes based upon their usefulness

Environmental attributes that may affect the costsGenerate default cost

Build an access plan associated with the default planFor each index:

Gather information needed specific to this indexBuild an access plan based on this indexCost the use of the index with this access planCompare the resulting cost against the cost from the current best plan

Strategy for Query Optimization

IBM eServer iSeries

© 2005 IBM Corporation

Optimizing indexes will generally follow this simplified strategy:

Gather list of indexes for statistics and costing

Sort the list of indexes considering how the index can be usedLocal selectionJoiningGroupingOrderingIndex only access

One index may be useful for statistics, and another useful for implementation

Strategy for Query Optimization

IBM eServer iSeries

© 2005 IBM Corporation

All query optimizers rely upon statistics to make plan decisions

–DB2 UDB for the iSeries has always relied upon indexes as its source for stats

–Other databases rely upon manual stats collection for their source

SQE offers a hybrid approach where column stats will be automatically collected for cases where indexes do not already exist

Statistics

IBM eServer iSeries

© 2005 IBM Corporation

Meta-data sources

–Existing indexes (Radix or Encoded Vector)

ƒ More accurately describes multi-column key valuesƒ Stats available immediately as the index maintenance occursƒ Selectivity estimates from radix by reading n keysƒ Selectivity from EVI by reading symbol table values

–Column Statistics

ƒ SQE onlyƒ Column Cardinality, Histograms & Frequent Values Listƒ Constructed over a single column in a tableƒ Stored internally as a part of the table object after createdƒ Collected automatically by default for the systemƒ Stats not immediately maintained as the table changesƒ Stats are refreshed as they become “stale” over time

Default sources

–No representation of actual values in columns

Best

Worst

Sources of Information

IBM eServer iSeries

© 2005 IBM Corporation

i5/OS Statistics collection job

–Reactive, based on query requests

–Automatic collection runs in this background job at very low priority

ƒ QDBFSTCCOL system job

–Statistics Manager continuously analyzes entries in the Plan Cache and queues up requests for the collection job

–Controlled by system value QDBFSTCCOL

iSeries Navigator graphical interface to manage stats collected by the system

–API’s also provided to manage the stats

SQE Automatic Stats Collection

IBM eServer iSeries

© 2005 IBM Corporation

What is the optimizer's job?

What is the optimizer's output?

What are some of the key elements used for cost based optimization?

What things affect the Access plan?

Look at resources used as well as response time.

Review

IBM eServer iSeries

© 2005 IBM Corporation

IBM Corporation 1994-2005. All rights reserved.References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both:

Rational is a trademark of International Business Machines Corporation and Rational Software Corporation in the United States, other countries, or both.Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.UNIX is a registered trademark of The Open Group in the United States and other countries.SET and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC. Other company, product or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Photographs shown are of engineering prototypes. Changes may be incorporated in production models.

Trademarks and Disclaimers