SQL Coding Best Practices for Developers

Post on 14-Jan-2016

45 views 0 download

description

Platform:DB2 for Linux, UNIX, and Windows. SQL Coding Best Practices for Developers. Phil Gunning Principal Consultant, Gunning Technology Solutions, LLC Session: G2 May 23, 2005 12:30 – 1:40. Outline. Best Practices Classes of Predicates Index SARGable Range Delimiting - PowerPoint PPT Presentation

Transcript of SQL Coding Best Practices for Developers

SQL Coding Best Practicesfor Developers

Phil GunningPrincipal Consultant, Gunning Technology Solutions, LLC

Session: G2May 23, 200512:30 – 1:40

Platform: DB2 for Linux, UNIX, and Windows

2

Outline

• Best Practices• Classes of Predicates

• Index SARGable• Range Delimiting• Data SARGable

• Predicate Best Practices• Local, Order By, Join Predicates

• Restricting Results• Restrict before joining• Selectivity• DB2 Catalog Queries/Explain

3

Outline

• Index Design• Local, Order By, Join predicates

• Include Columns

• Uniqueness

• DB2 Visual Explain/db2exfmt/Design Advisor• Monitor and Evaluate• Summary

4

Best Practices1. Use Range Delimiting and Index SARGable Predicates wherever possible

2. Understand DB2 predicate rules

3. Specify most restrictive predicates first

4. Select only columns that are needed

5. Adhere to proper index design techniques

6. Understand inputs to the Optimizer

7. Developers and DBAs collaborate to design proper indexes

8. Evaluate all SQL using Visual Explain/db2exfmt

9. Use Design Advisor to tune SQL/SQL Workloads

10. Consistently monitor and review application performance

5

Application

Relational Data Services

Data Management Services

Data

Residual predicates

Data SARGable predicates

Range Delimiting

Index SARGableIndex Manager

CO

ST

6

Classes of Predicates

• Range Delimiting• Index SARGable

• Predicates that can use an index for a search argument

• Resolved by Index Manager

• Data SARGable

7

Predicate Example Index

• For the following predicate rule examples, assume that an index has been created on Col A, Col B, and Col C Asc as follows:• ACCT_INDX:

Col A Col B Col C

8

Predicates

• Range Delimiting • Used to bracket an index scan

• Uses start and stop predicates

• Evaluated by the Index Manager

9

Range Delimiting ExampleCol A = 3 and Col B = 6 and Col C = 8

In this case the equality predicates on all the columns of the index can be applied as start-stop keys and they are all range delimiting

10

Predicates

• Index SARGable • Are not used to bracket an index scan

• Can be evaluated from the index if one is chosen

• Evaluated by the Index Manager

11

Index SARGable ExampleCol A = 9 and Col C = 4 Col A can be used as a range

delimiting (start-stop) predicate. Col C can be used as an Index SARGable predicate, it cannot be used as a range delimiting since there is no predicate on Col B.

Starting with columns in the index, from left to right, the first inequality predicate stops the column matching.

12

Predicates

• Data SARGable • Cannot be evaluated by the Index Manager

• Evaluated by Data Management Services

• Require the access of individual rows from the base table

13

Data SARGable Example

Col A = 3 and Col B <= 6 and Col D = 9 Col A is used as a start-stop predicate, Col B is used as a stop predicate, and Col D which is not present in the index is applied as a Data SARGable predicate during the FETCH from the table

14

Predicates

• Residual Predicates • Cannot be evaluated by the Index Manager

• Cannot be evaluated by Data Management Services

• Require IO beyond accessing the base table• Predicates such as those using quantified sub-queries (ANY,

ALL, SOME, or IN), LONG VARCHAR, or LOB data which is stored separately from the table

• Are evaluated by Relational Data Services and are the most expensive type of predicates

15

Residual Predicate ExampleCol B = 4 and UDF with external action(Col D)

In this case the leading Col A does not have a predicate. Col B can only be used as an Index SARGable predicate (where the whole index is scanned). Col D involves a user defined function which will be applied as a residual predicate

16

RULE#1

• Use Range Delimiting and Index SARGable predicates whenever possible

17

Query Rewrite

• The DB2 for Linux, UNIX and Windows optimizer contains significant query rewrite capability

• Still important to write predicates following the local, order by, join rule

• Query rewrite will take care of most transformations that need to be made

• However, if predicates are missing or indexes are not available to support the access paths, your SQL will not be able to take advantage of query rewrite

18

Query Rewrite

• The DB2 for Linux, UNIX and Windows optimizer contains significant query rewrite capability

• Still important to write predicates following the local, order by, join rule

• Query rewrite will take care of most transformations that need to be made

• However, if predicates are missing or indexes are not available to support the access paths, your SQL will not be able to take advantage of query rewrite

19

Index Review

• An index is a data structure that contains column values and a pointer to the table data

• Primary key – Unique Index• If a primary key is defined, DB2 automatically creates a unique index to

enforce the PK constraint

• Secondary Index• Created to support access to frequently referenced columns

• Indexes provide efficient access (in terms of CPU and IO) to columns found in the table

• Just like an index entry in a book, an index in a database enables rapid lookup of associated table entries

20

Index Characteristics

• Index entries are usually much smaller (subset) of all table columns

• Can fit more index entries on a page• Allows for more efficient use of buffer pool• Separate index buffer pool

• Enables often used index pages to remain in the buffer pool longer

• More logical IO than physical IO

21

A Word About Index Structures

• B+ -tree used to store index entries• Provides for a tree structure that is balanced to a constant

depth from the root to the leaf blocks along every branch• Usually more efficient (less costly) than a table scan

22

B+ - Tree Index Example

ROOT

Intermediate NODE 1

Intermediate NODE 4

Intermediate NODE 3

Intermediate NODE 2

Leaf NODE L1.1

Leaf NODE L1.300

Leaf NODE L2.600

Leaf NODE L4.1200

Leaf NODE L4.1

Leaf NODE L3.900

Leaf NODE L3.1

Leaf NODE L2.1

Level 1

Level 0

Level 2

23

Index-Only Access

ROOT

Intermediate NODE 1

Intermediate NODE 4

Intermediate NODE 3

Intermediate NODE 2

Leaf NODE L1.1

Leaf NODE L1.300

Leaf NODE L2.600

Leaf NODE L4.1200

Leaf NODE L4.1

Leaf NODE L3.900

Leaf NODE L3.1

Leaf NODE L2.1

Level 1

Level 0

Level 2

24

Range Delimiting (Start – Stop Predicates)

ROOT

Intermediate NODE 1

Intermediate NODE 4

Intermediate NODE 3

Intermediate NODE 2

Leaf NODE L1.1

Leaf NODE L1.300

Leaf NODE L2.600

Leaf NODE L4.1200

Leaf NODE L4.1

Leaf NODE L3.900

Leaf NODE L3.1

Leaf NODE L2.1

Level 1

Level 0

Level 2

Base Table

25

Select deptnumb, deptname from db2admin.orgWhere deptnumb < 20

26

Range Delimiting Example

27

Full Table Scan Query ….

Index ACCT_IND1

28

Select deptnumb, deptname from db2admin.org Where deptnumb =20 and deptname like 'b%' or division = 'midwest' and manager = 88 or location like 'bo%'

CREATE INDEX "DB2ADMIN"."YYZZ" ON "DB2ADMIN"."ORG" ("DEPTNUMB" ASC, "DEPTNAME" ASC, "DIVISION" ASC)

Created this index and ran this SQL

Table Scan Example

29

Table Scan Example

30

Table Scan Rules of Thumb

• If > 20-25% of the rows will be read, good likelihood of table scan

• If 0.5 – 20% of the rows are read, likely index access but this can vary depending on numerous factors

• Exact formulas used are complex and not very useful for practical purposes

31

Rule #2

• Understand and apply DB2 predicate rules

32

WITH DEPT_MGR AS ( SELECT DEPTNO, DEPTNAME, EMPNO, LASTNAME, FIRSTNME, PHONENO FROM DEPARTMENT D, EMPLOYEE E WHERE D.MGRNO=E.EMPNO AND E.JOB='MANAGER' ), DEPT_NO_MGR AS ( SELECT DEPTNO, DEPTNAME, MGRNO AS EMPNO FROM DEPARTMENT EXCEPT ALL SELECT DEPTNO, DEPTNAME, EMPNO FROM DEPT_MGR ), MGR_NO_DEPT (DEPTNO, EMPNO, LASTNAME, FIRSTNME, PHONENO) AS ( SELECT WORKDEPT, EMPNO, LASTNAME, FIRSTNME, PHONENO FROM EMPLOYEE WHERE JOB='MANAGER' EXCEPT ALL SELECT DEPTNO,EMPNO, LASTNAME, FIRSTNME, PHONENO FROM DEPT_MGR ) SELECT DEPTNO, DEPTNAME, EMPNO, LASTNAME, FIRSTNME, PHONENO FROM DEPT_MGR UNION ALL SELECT DEPTNO, DEPTNAME, EMPNO, CAST(NULL AS VARCHAR(15)) AS LASTNAME, CAST(NULL AS VARCHAR(12)) AS FIRSTNME, CAST(NULL AS CHAR(4)) AS PHONENO FROM DEPT_NO_MGR UNION ALL SELECT DEPTNO, CAST(NULL AS VARCHAR(29)) AS DEPTNAME, EMPNO, LASTNAME, FIRSTNME, PHONENO FROM MGR_NO_DEPT ORDER BY 4

33

A More “Complicated” Example

34

Table Scan Example

35

Created Two Indexes

CREATE INDEX "DB2ADMIN"."AABB" ON "DB2ADMIN"."DEPARTMENT"

("DEPTNO" ASC,

"DEPTNAME" ASC,

"MGRNO" ASC)

PCTFREE 10 CLUSTER MINPCTUSED 10

ALLOW REVERSE SCANS;

CREATE INDEX "DB2ADMIN"."CCDD" ON "DB2ADMIN"."EMPLOYEE"

("EMPNO" ASC, "FIRSTNME" ASC,

"MIDINIT" ASC, "LASTNAME" ASC,

"WORKDEPT" ASC, "PHONENO" ASC)

PCTFREE 10 MINPCTUSED 10

ALLOW REVERSE SCANS;

36

Index Scan Example

Index Scan on AABB

index

37

Index Scan of entire index then fetch

from table

Full Index Scan

38

Index Scan

ROOT

Intermediate NODE 1

Intermediate NODE 4

Intermediate NODE 3

Intermediate NODE 2

Leaf NODE L1.1

Leaf NODE L1.300

Leaf NODE L2.600

Leaf NODE L4.1200

Leaf NODE L4.1

Leaf NODE L3.900

Leaf NODE L3.1

Leaf NODE L2.1

Level 1

Level 0

Level 2

Base Table

Index on: DEPTNAME,DEPTNO MGRNO

RIDS

39

Selectivity Catalog Queries

• SELECT INDNAME, NPAGES, CARD, FIRSTKEYCARD AS FIRSTK, FIRST2KEYCARD AS F2KEY, FIRST3KEYCARD AS F3KEY, FIRST4KEYCARD AS F4KEY, FULLKEYCARD AS FULLKEY, NLEAF, NLEVELS AS NLEV, CLUSTERRATIO AS CR, CLUSTERFACTOR AS CF, UNIQUERULE AS U, T.COLCOUNT AS TBCOL, I.COLCOUNT AS IXCOL FROM SYSCAT.TABLES T, SYSCAT.INDEXES I WHERE T.TABSCHEMA = I.TABSCHEMA AND T.TABSCHEMA = ‘PGUNNING' AND T.TABNAME = I.TABNAME AND CARD >20000 ORDER BY CARD DESC, 1;

40

INDNAME NPAGES CARD FIRSTK F2KEY F3KEY F4KEY FULLKEY NLEAF NLEV CF U TBCOL IXCOL COLNAMESPSAITEM_DST516893 5682718 3 9 21 7339 7339 6886 3 8.90E-01 D 53 5 +APPL_JRNL_ID+BUSINESS_UNIT+GL_DISTRIB_STATUS+ACCOUNTING_DT+FOREIGN_CURRENCYPSBITEM_DST516893 5682718 1207 -1 -1 -1 1207 7412 3 9.00E-01 D 53 1 +PROCESS_INSTANCEPS_ITEM_DST516893 5682718 3 14947 1313902 1313906 5682718 136107 5 8.33E-01 U 53 8 +BUSINESS_UNIT+CUST_ID+ITEM+ITEM_LINE+ITEM_SEQ_NUM+LEDGER_GROUP+LEDGER+DST_SEQ_NUMPS_PENDING_DST516882 5682569 3 146890 148686 380119 5682569 172775 5 9.18E-01 U 49 10 +GROUP_BU+GROUP_ID+BUSINESS_UNIT+CUST_ID+ITEM+ITEM_LINE+GROUP_SEQ_NUM+LEDGER_GROUP+LEDGER+DST_SEQ_NUPSAITEM_ACTIVITY577065 2884167 3 2959 -1 -1 2959 3667 3 7.77E-01 D 91 2 +BUSINESS_UNIT+ACCOUNTING_DTPSBITEM_ACTIVITY577065 2884167 -1 -1 -1 -1 -1 -1 -1 -1.00E+00 D 91 4 +DEPOSIT_BU+DEPOSIT_ID+PAYMENT_SEQ_NUM+PAYMENT_IDPS_ITEM_ACTIVITY577065 2884167 3 14947 1313902 1313906 2884167 50034 4 7.15E-01 U 91 5 +BUSINESS_UNIT+CUST_ID+ITEM+ITEM_LINE+ITEM_SEQ_NUMPSAJRNL_LN119531 1433651 543 1865 1865 -1 1865 1755 3 7.70E-01 D 44 3 +ACCOUNT+BUSINESS_UNIT+CURRENCY_CDPSBJRNL_LN119531 1433651 56152 60820 65131 65268 65268 2184 3 9.86E-01 D 44 4 +JOURNAL_ID+JOURNAL_DATE+BUSINESS_UNIT+UNPOST_SEQPSDJRNL_LN119531 1433651 12831 14473 116514 -1 116514 2538 3 8.58E-01 D 44 3 +PROCESS_INSTANCE+BUSINESS_UNIT+ACCOUNTPSFJRNL_LN119531 1433651 1318 3654 3705 65268 65270 2223 3 9.68E-01 D 44 5 +JOURNAL_DATE+BUSINESS_UNIT+UNPOST_SEQ+JOURNAL_ID+JRNL_LINE_SOURCE

NLEVELS > 3

41

XBOOKING1 Selectivity = Number of Distinct Values / CARD

1229/389151 = .003

Meets our rule for selectivity < .10

XBOOKING2 Selectivity = Number of Distinct Values / CARD

111217/389151 = .285

Does not meet our rule for selectivity < .10

42

Data Specification

• Specify the most restrictive predicates first• Select only those columns needed • Use business sense when developing reports for end users

• They should not be so voluminous that the average end user will not be able to use them anyway

• Haven’t we all seen these monster reports that consume lots of CPU and IO and never get looked at?

43

Restriction

• Restrict before you join• Example:

“Select * from acct, dept where acct.nbr = dept.acct_id

and acct.loc = 5acct.loc = 5”

• In this example, acct.loc = 5 is a restrictive expression• It will be applied before the join thus the number of rows

joined decreases• With just the join expression, the number of rows increase as

many rows in dept might match the ones in acct

44

Fast Retrieval

• OPTIMIZE FOR N ROWS CLAUSE• Can guide the optimizer to use an access path to quickly

return N Rows • Also effects the size of the number of rows blocked in the

communications buffer• Useful when the number of rows you want is significantly less

than total number of rows that could be returned• Can slow performance if most of the rows are going to be

processed

45

Fetch First

• FETCH FIRST N ROWS ONLY CLAUSE• Used to restrict fetching to only N rows regardless of number of rows

that there may have been in the result set if not specified

• FOR FETCH ONLY CLAUSE• Use when no updates are planned

• Query can take advantage of row blocking

• Only S locks taken on rows retrieved• Improved concurrency

46

Rule #3 & 4

• Specify most restrictive predicates first• Select only those columns needed

47

Selectivity

• Selectivity of an index column indicates the number of rows that will satisfy the predicate condition

• Formula:• Selectivity = number of distinct values / number of rows in the table

• Selectivity of predicates should be < .10, that is will return less than 10% of the table rows to the requesting application or to the intermediate result set if more than a two-way join

48

Index Design• Indexes should be created on local, order by, join predicates and foreign keys

• Primary key unique index created by default• Many times the primary key will be used by the optimizer as the driving index due to

uniqueness and selectivity• Frequently accessed columns with good selectivity

• Avoid Redundant Indexes as they are typically not used or offer additional unnecessary choices to the optimizer

• Number of Indexes • Determined by business rules

• OLTP• 3 indexes• Fewer indexes offer fewer choices to the optimizer

• Mixed (ERP/CRM/SCM)• 3-5 indexes

• DW• 5 or more

49

Rule#5

• Adhere to proper index design techniques

50

DB2 Optimizer

• What inputs does the Optimizer consider/analyze during statement optimization?

• Important to know as some of these inputs can cause suboptimal access paths if not current• RUNSTATS not current

• Buffer pool changes

• Configuration parameter changes

51

Buffer pool size (npages) To determine how much of the buffer pool may be available for tables/indexes involved.

SORTHEAP DB CFG parameter To determine if a piped sort can be used.

LOCKLIST To determine amount of memory available for storing locks for this access plan.

CPU Speed Speed of CPUs available.

PREFETCHSIZE To determine I/O costs.

Value of INTRA_PARALLEL DBM CFG Parameter

To determine if parallelism may be used.

Type of table space and number of containers

To determine I/O costs and degree of I/O parallelism.

SHEAPTHRES Determine maximum amount of shared SORTHEAP available.

DISK Speed To estimate I/O costs.

52

Degree of clustering To determine effectiveness of prefetching and to determine how clustered data is.

Indexes Available To determine if index access cost.

DFT_DEGREE Default degree of parallelism.

AVG_APPLS To determine amount of buffer pool space available for a query.

MAXLOCKS Percent of LOCKLIST used by a single application before lock escalation occurs.

LOCKLIST Size of memory area reserved for locks.

DFT_QUERYOPT The default optimization class to be used.

STMTHEAP Size can effect amount of optimization conducted.

COMM_BANDWITH Used for partitioned databases.

MAX_QUERYDEGREE Maximum number of subagents to be used if intra_parallel enabled.

53

REOPT Bind Option

• Can be used to enable query reoptimization for dynamic and static SQL that have host variables, parameter markers or special registers

• Can set the REOPT option to one of three values• None – No reoptimization will take place, the default behavior

• Once – the access plan will use real values the first time and the plan will be cached in the package cache

• Always – the access path will always be compiled and reoptimized using the values of the parameter markers, host variables, or special registers known at each execution time

54

Lock Wait Mode

• Application can specify individual lock wait mode strategy• Take one of the following actions when it cannot obtain a

lock:• Return and SQLCODE or SQLSTATE

• Wait indefinitely for a lock

• Wait a specified amount of time for a lock

• Use value of locktimeout DB CFG parameter

• SET CURRENT LOCK TIMEOUT statement• Specifies number of seconds to wait for a lock

• Applies to row, table, index key, and MDC block locks

55

KEEP UPDATE LOCKS

• A lock type can be specified for queries that perform updates• Allows FOR UPDATE cursors to take advantage of row

blocking• RR or RS can be used when querying a read only results

table • Allows positioned cursor updates to succeed

56

Rule#6

• Understand inputs to the DB2 Optimizer

57

DB2 Visual Explain

• Use it as part of testing and development process• Developers can use any type of DB2 explain or other SQL

Analysis tool but it must be integrated into the development process

• DBAs also use all types of explain in support of application development testing and in fixing production problems

• Evaluate all SQL using Visual Explain/db2exfmt or some type of explain tool

• Monitor on a recurring basis

58

Rule #7

• Developers and DBAs collaborate to develop applications that perform when implemented in production

59

Design Advisor

• DBAs work with developers using Design Advisor to evaluate individual SQL statements and workloads to identify possible index solutions, clustering indexes, MDC indexes, and MQT recommendations

• DBAs use the package cache option to look for high cost SQL in the package cache

• Best used as part of physical design process but use is ongoing

60

db2advis -d gunprd -I wildsortsql.txt > wildsqlixadvout.txtexecution started at timestamp 2004-08-12-10.25.44.141157 found [1] SQL statements from the input fileCalculating initial cost (without recommmended indexes) [23866.660156] timeronsInitial set of proposed indexes is ready.Found maximum set of [1] recommended indexesCost of workload with all indexes included [75.079346] timeronstotal disk space needed for initial set [ 4.747] MBtotal disk space constrained to [ -1.000] MB 1 indexes in current solution [23866.6602] timerons (without indexes) [ 75.0793] timerons (with current solution) [%99.69] improvement Trying variations of the solution set.---- execution finished at timestamp 2004-08-12-10.25.45.932376---- LIST OF RECOMMENDED INDEXES-- ===========================-- index[1], 4.747MB CREATE INDEX WIZ1 ON "PUSER "."T_FILE" ("FILE_STATUS_NUM" DESC) ;-- ===========================--Design Advisor tool is finished.

61

Rule #8

• Evaluate all SQL using Visual Explain/db2exfmt or some type of explain tool

• Use Design Advisor to tune SQL statements and workloads

62

RUNSTATS• RUNSTATS should be run on a regular schedule• After a reorg, change in prefetchsize, static SQL change, growth in data• What is a regular schedule?• Nightly or Weekly depending on many things

• Data changes by 10% or more• Indexes changed or added

• Use SAMPLING in V8.1+, and the WITH DISTRIBUTION clause • If simple and straight forward queries and no skewed data then don’t use WITH

DISTRIBUTION• In database with all dynamic SQL like most ERP, CRM, SCM packages

today, RUNSTATS may be needed nightly if data changes as noted above

• Most shops that get consistent performance schedule RUNSTATS either nightly or weekly

63

Monitoring

• If you don’t know how performance was, you can’t tell why it is suddenly bad

• Application developers can monitor the performance of applications under their control by:

• Monitoring service levels• Periodic explains in Production• Checking with end users/help desk• Querying the performance repository

• DBAs should implement and maintain a continuous monitoring program using snapshots, snapshot repository, and event monitors when needed

64

Monitoring

• Compute performance metrics on an hourly/daily basis and track and evaluate over time

• With such a system established, you will be able to answer such questions as “What was this SQL running like yesterday, last week, and last month?”

• Were there any database problems today?• Why has this query suddenly gone from running in 30

seconds to 30 hrs?

65

Rule#10

• Implement a monitoring solution that provides both real-time and historical performance data

• Build canned reports to identify top 10 SQL statements, in terms of User CPU, System CPU, rows read, sorts, and sort time

• Implement a “closed loop” system where problems are tracked until they are resolved

66

67

SQL Coding Best Practices for DevelopersSession: G2

THANK YOU!

Phil GunningGunning Technology Solutions, LLC

pgunning@gunningts.com