Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what...

37
Database Systems – SQL Database Systems – SQL SQL OPTIMIZATION SQL OPTIMIZATION Writing efficient queries requires understanding what effects the Writing efficient queries requires understanding what effects the performance of your query. performance of your query. In general hardware can have a profound effect on your performance. In general hardware can have a profound effect on your performance. There are some basic physical constraints that databases must work There are some basic physical constraints that databases must work around. Especially with non-solid state memory. Solid state memory around. Especially with non-solid state memory. Solid state memory changes the stats below and are constantly rewriting the specifications. changes the stats below and are constantly rewriting the specifications. Disk Seeks – Average 10 ms, aka 100 seeks a second. Only real Disk Seeks – Average 10 ms, aka 100 seeks a second. Only real optimization is to distribute data across multiple disks. Seek time per optimization is to distribute data across multiple disks. Seek time per table is hard to improve. table is hard to improve. Disk Reading & Writing – 10 to 20 MB per second, best way to optimize is Disk Reading & Writing – 10 to 20 MB per second, best way to optimize is to distribute data across multiple disks to distribute data across multiple disks Disk Spindles – The greater the number of spindles the greater the Disk Spindles – The greater the number of spindles the greater the opportunity for a database to read/write in parallel. opportunity for a database to read/write in parallel. CPU Cycles – Data must be processed. Smaller data fits in memory and thus CPU Cycles – Data must be processed. Smaller data fits in memory and thus faster to process. faster to process.

Transcript of Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what...

Page 1: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

Writing efficient queries requires understanding what effects the performance of your Writing efficient queries requires understanding what effects the performance of your query.query.

In general hardware can have a profound effect on your performance.In general hardware can have a profound effect on your performance.

There are some basic physical constraints that databases must work around. Especially There are some basic physical constraints that databases must work around. Especially with non-solid state memory. Solid state memory changes the stats below and are with non-solid state memory. Solid state memory changes the stats below and are constantly rewriting the specifications. constantly rewriting the specifications.

Disk Seeks – Average 10 ms, aka 100 seeks a second. Only real optimization is to Disk Seeks – Average 10 ms, aka 100 seeks a second. Only real optimization is to distribute data across multiple disks. Seek time per table is hard to improve.distribute data across multiple disks. Seek time per table is hard to improve.

Disk Reading & Writing – 10 to 20 MB per second, best way to optimize is to distribute Disk Reading & Writing – 10 to 20 MB per second, best way to optimize is to distribute data across multiple disksdata across multiple disks

Disk Spindles – The greater the number of spindles the greater the opportunity for a Disk Spindles – The greater the number of spindles the greater the opportunity for a database to read/write in parallel.database to read/write in parallel.

CPU Cycles – Data must be processed. Smaller data fits in memory and thus faster to CPU Cycles – Data must be processed. Smaller data fits in memory and thus faster to process.process.

Page 2: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

A bigger issue is your data base design and how your query executes.A bigger issue is your data base design and how your query executes.

Let’s start by understanding how Postgres executes a query.Let’s start by understanding how Postgres executes a query.

1.1.Transmitting the SQL string to the database backend.Transmitting the SQL string to the database backend.

2.2.Parsing the query stringParsing the query string

3.3.Planning of the query to optimize the retrieval of dataPlanning of the query to optimize the retrieval of data

4.4.Retrieval of data from the hardwareRetrieval of data from the hardware

5.5.Transmission of the data to the client.Transmission of the data to the client.

http://www.revsys.com/writings/postgresql-performance.html, Note there are additional , Note there are additional comments here about general DB tuning that are skipped as we do not have control comments here about general DB tuning that are skipped as we do not have control over our database environment.over our database environment.

Page 3: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

Transmitting the SQL string to the database backend.Transmitting the SQL string to the database backend.

This is not a particularly lengthy step. It requires the actual characters you type to be This is not a particularly lengthy step. It requires the actual characters you type to be transferred to the database. If a query is exceptionally long it may be placed in a stored transferred to the database. If a query is exceptionally long it may be placed in a stored procedure to save the transmission time.procedure to save the transmission time.

Parsing the query stringParsing the query string

The text you typed must be broken into tokens. Again, if this takes longer than desired, The text you typed must be broken into tokens. Again, if this takes longer than desired, a stored procedure may be used to save the parsing time.a stored procedure may be used to save the parsing time.

Planning of the query to optimize the retrieval of dataPlanning of the query to optimize the retrieval of data

If your query is already prepared it can save a lot of time. Otherwise it must determine If your query is already prepared it can save a lot of time. Otherwise it must determine the best way to execute the query. Should it use an index (s) or could a the best way to execute the query. Should it use an index (s) or could a hash join be be more efficient.more efficient.

Page 4: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

Retrieval of data from the hardwareRetrieval of data from the hardware

There isn’t much you can do here other than improving your hardware.There isn’t much you can do here other than improving your hardware.

Transmission of the data to the client.Transmission of the data to the client.

There isn’t much you can do here other than to minimize the number of columns There isn’t much you can do here other than to minimize the number of columns returned in the result set as well as the number of rows.returned in the result set as well as the number of rows.

Page 5: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

The more extensive your permissions are, the less optimized your database will be.The more extensive your permissions are, the less optimized your database will be.

Table level, column level permissions, resource counting, etc can be problematic if you Table level, column level permissions, resource counting, etc can be problematic if you have a large number of statements being executed. However, if you have a few very have a large number of statements being executed. However, if you have a few very queries that execute over a large amount of data, this factor may not be as significant.queries that execute over a large amount of data, this factor may not be as significant.

USING EXPLAINUSING EXPLAIN

Using EXPLAIN helps you understand how your query executes. It informs you what Using EXPLAIN helps you understand how your query executes. It informs you what order tables are joined and what indexes are used to join them. If you notice joins on order tables are joined and what indexes are used to join them. If you notice joins on unindexed fields, you can index them to improve performance. unindexed fields, you can index them to improve performance.

To execute an EXPLAIN simply type:To execute an EXPLAIN simply type:

EXPLAIN EXPLAIN SqlQuerySqlQuery;;

Page 6: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

Suppose that you have the SELECT statement shown here and that you plan to Suppose that you have the SELECT statement shown here and that you plan to examine it using EXPLAIN ANALYZE: examine it using EXPLAIN ANALYZE:

CREATE TABLE authors ( CREATE TABLE authors (

id int4 PRIMARY KEY, id int4 PRIMARY KEY,

name varcharname varchar

); );

CREATE TABLE books ( CREATE TABLE books (

id int4 PRIMARY KEY, id int4 PRIMARY KEY,

author_id int4, author_id int4,

title varchartitle varchar

); );

Page 7: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

Try analyzing the following query:Try analyzing the following query:

EXPLAIN ANALYZE SELECT authors.name, books.title FROM books, authors WHERE books.author_id=16 and authors.id = books.author_id ORDER BY books.title;

We get:QUERY PLAN

------------------------------------------------------------------------------------------------- Sort (cost=29.71..29.73 rows=6 width=64) (actual time=0.189..16.233 rows=7 loops=1)

Sort Key: books.title

-> Nested Loop (cost=0.00..29.63 rows=6 width=64) (actual time=0.068..0.129 rows=7 loops=1)

-> Index Scan using authors_pkey on authors (cost=0.00..5.82 rows=1 width=36) (actual time=0.029..0.033 rows=1 loops=1)

Index Cond: (id = 16)

-> Seq Scan on books (cost=0.00..23.75 rows=6 width=36) (actual time=0.026..0.052 rows=7 loops=1)

Filter: (author_id = 16)

Total runtime: 16.386 ms

Page 8: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

Read from the bottom upRead from the bottom up

We first see the complete query time:We first see the complete query time:

Total runtime: 16.386 ms

Postgres then performs a sequential scan on the books table filtering the rows that Postgres then performs a sequential scan on the books table filtering the rows that have an author_id of 16.have an author_id of 16.

-> Seq Scan on books (cost=0.00..23.75 rows=6 width=36) (actual time=0.026..0.052 rows=7 loops=1)

Filter: (author_id = 16)

Although no explicit index is placed on the authors table, an implicit one exists due to Although no explicit index is placed on the authors table, an implicit one exists due to the primary key. Therefore postgres utilizes it to select the authors whose keys are the primary key. Therefore postgres utilizes it to select the authors whose keys are equal to 16equal to 16

-> Index Scan using authors_pkey on authors (cost=0.00..5.82 rows=1 width=36) (actual time=0.029..0.033 rows=1 loops=1)

Index Cond: (id = 16)

Page 9: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

The final results are sorted by the book’s title:The final results are sorted by the book’s title:

Sort (cost=29.71..29.73 rows=6 width=64) (actual time=0.189..16.233 rows=7 loops=1)

Sort Key: books.title

-> Nested Loop (cost=0.00..29.63 rows=6 width=64) (actual time=0.068..0.129 rows=7 loops=1)

Note the actual and estimated times are listing in parenthesis.Note the actual and estimated times are listing in parenthesis.

Let’s add an indexLet’s add an index

CREATE INDEX books_idx1 on books(author_id);

If you rerun the EXPLAIN query would you expect the performance to increase?If you rerun the EXPLAIN query would you expect the performance to increase?

Page 10: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

The final results are sorted by the book’s title:The final results are sorted by the book’s title:

Sort (cost=29.71..29.73 rows=6 width=64) (actual time=0.189..16.233 rows=7 loops=1)

Sort Key: books.title

-> Nested Loop (cost=0.00..29.63 rows=6 width=64) (actual time=0.068..0.129 rows=7 loops=1)

Note the actual and estimated times are listing in parenthesis.Note the actual and estimated times are listing in parenthesis.

Let’s add an indexLet’s add an index

CREATE INDEX books_idx1 on books(author_id);

If you rerun the EXPLAIN query would you expect the performance to increase?If you rerun the EXPLAIN query would you expect the performance to increase?

It will not, until you run:It will not, until you run:

ANALYZE books;

However, you are still not ensured the index will be used. If there are a small number of However, you are still not ensured the index will be used. If there are a small number of records in books, it still may perform a sequential scan.records in books, it still may perform a sequential scan.

Page 11: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

What are other considerations why an index might not be used in Postgres?What are other considerations why an index might not be used in Postgres?

Problem: The planner has decided its faster to do a table scan than an index scan : This can happen if a) your table Problem: The planner has decided its faster to do a table scan than an index scan : This can happen if a) your table is relatively small, or the field you are indexing has a lot of duplicates. Solution: Case in point, boolean fields are not is relatively small, or the field you are indexing has a lot of duplicates. Solution: Case in point, boolean fields are not terribly useful to index since 50% of your data is one thing and 50% is another. However they are good candidates terribly useful to index since 50% of your data is one thing and 50% is another. However they are good candidates to use for Partial indexes e.g. to only index data that is active.to use for Partial indexes e.g. to only index data that is active.

Problem: You set up an index that is incompatible with how you are actually filtering a field. There are a couple of Problem: You set up an index that is incompatible with how you are actually filtering a field. There are a couple of variants of this situation. The oldvariants of this situation. The old

LIKE '%me' will never use an index, but LIKE 'me%' can possibly use an index.LIKE '%me' will never use an index, but LIKE 'me%' can possibly use an index.

The upper lower trap - you defined your index like: The upper lower trap - you defined your index like:

CREATE INDEX idx_faults_name ON faults USING btree(fault_name);, But you are running a query like this: CREATE INDEX idx_faults_name ON faults USING btree(fault_name);, But you are running a query like this:

SELECT * FROM faults where UPPER(fault_name) LIKE 'CAR%' Possible fix: SELECT * FROM faults where UPPER(fault_name) LIKE 'CAR%' Possible fix:

CREATE INDEX idx_faults_name ON faults USING btree(upper(fault_name));CREATE INDEX idx_faults_name ON faults USING btree(upper(fault_name));

http://www.postgresonline.com/journal/archives/78-Why-is-my-index-not-being-used.html

Page 12: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATIONOPTIMIZATION

For this example (run in mySQL), make the following assumptions: For this example (run in mySQL), make the following assumptions:

The columns being compared have been declared as follows: The columns being compared have been declared as follows:

TableTable ColumnColumn Data TypeData Type

tttt ActualPCActualPC CHAR(10)CHAR(10)

tttt AssignedPCAssignedPC CHAR(10)CHAR(10)

tttt ClientIDClientID CHAR(10)CHAR(10)

etet EMPLOYIDEMPLOYID CHAR(15)CHAR(15)

dodo CUSTNMBRCUSTNMBR CHAR(15)CHAR(15)

The tables have the following indexes:

TableTable IndexIndex

tttt ActualPCActualPC

tttt AssignedPCAssignedPC

tttt ClientIDClientID

etet EMPLOYID (primary key)EMPLOYID (primary key)

dodo CUSTNMBR (primary key)CUSTNMBR (primary key)

Page 13: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQLOPTIMIZATION / mySQLOPTIMIZATION / mySQL

The The tt.ActualPCtt.ActualPC values are not evenly distributed. values are not evenly distributed.

Initially, before any optimizations have been performed, the Initially, before any optimizations have been performed, the EXPLAINEXPLAIN statement produces the following information:statement produces the following information:

table type possible_keys key key_len ref rows Extratable type possible_keys key key_len ref rows Extraet ALL PRIMARY NULL NULL NULL 74et ALL PRIMARY NULL NULL NULL 74do ALL PRIMARY NULL NULL NULL 2135do ALL PRIMARY NULL NULL NULL 2135et_1 ALL PRIMARY NULL NULL NULL 74et_1 ALL PRIMARY NULL NULL NULL 74tt ALL AssignedPC, NULL NULL NULL 3872tt ALL AssignedPC, NULL NULL NULL 3872 ClientID,ClientID, ActualPCActualPC range checked for each record (key map: 35)range checked for each record (key map: 35)

Because Because typetype is is ALLALL for each table, this output indicates that MySQL is for each table, this output indicates that MySQL is generating a Cartesian product of all the tables.generating a Cartesian product of all the tables.

For the case at hand, this product is 74 × 2135 × 74 × 3872 = For the case at hand, this product is 74 × 2135 × 74 × 3872 = 45,268,558,720 rows. 45,268,558,720 rows.

Page 14: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQLOPTIMIZATION / mySQLOPTIMIZATION / mySQL

One problem here is that MySQL can use indexes on columns more efficiently if they One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size. are declared as the same type and size.

In this context, In this context, VARCHARVARCHAR and and CHARCHAR are considered the same if they are declared as are considered the same if they are declared as the same size. the same size.

tt.ActualPCtt.ActualPC is declared as is declared as CHAR(10)CHAR(10) and and et.EMPLOYIDet.EMPLOYID is is CHAR(15)CHAR(15), so there is a , so there is a length mismatch. length mismatch.

To fix this disparity between column lengths, use To fix this disparity between column lengths, use ALTER TABLEALTER TABLE to lengthen to lengthen ActualPCActualPC from 10 characters to 15 characters.from 10 characters to 15 characters.

ALTER TABLE tt MODIFY ActualPC VARCHAR(15);ALTER TABLE tt MODIFY ActualPC VARCHAR(15);

Page 15: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATION / mySQLOPTIMIZATION / mySQL

Now Now tt.ActualPCtt.ActualPC and and et.EMPLOYIDet.EMPLOYID are both are both VARCHAR(15)VARCHAR(15). .

Executing the Executing the EXPLAINEXPLAIN statement again produces this result: statement again produces this result:

table type possible_keys key key_len ref rows Extra table type possible_keys key key_len ref rows Extra

tt ALL AssignedPC, NULL NULL NULL 3872 Using ClientID, tt ALL AssignedPC, NULL NULL NULL 3872 Using ClientID, where where

ActualPC ActualPC

do ALL PRIMARY NULL NULL NULL 2135 do ALL PRIMARY NULL NULL NULL 2135

range checked for each record (key map: 1) range checked for each record (key map: 1)

et_1 ALL PRIMARY NULL NULL NULL 74 et_1 ALL PRIMARY NULL NULL NULL 74

range checked for each record (key map: 1) range checked for each record (key map: 1)

et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1

This is not perfect, but is much better: The product of the This is not perfect, but is much better: The product of the rowsrows values is less by a values is less by a

factor of 74. This version executes in a couple of seconds.factor of 74. This version executes in a couple of seconds.

Page 16: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATION / mySQLOPTIMIZATION / mySQL

A second alteration can be made to eliminate the column length mismatches for the A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYIDtt.AssignedPC = et_1.EMPLOYID and and tt.ClientID = do.CUSTNNBRtt.ClientID = do.CUSTNNBR comparisons: comparisons:

ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), -> MODIFY ClientID VARCHAR(15);ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), -> MODIFY ClientID VARCHAR(15);

After that modification, After that modification, EXPLAINEXPLAIN produces the output shown here: produces the output shown here:

table type possible_keys key key_len ref rows Extra table type possible_keys key key_len ref rows Extra

et ALL PRIMARY NULL NULL NULL 74 et ALL PRIMARY NULL NULL NULL 74

tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using ClientID, where tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using ClientID, where ActualPC ActualPC

et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1

do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1

Page 17: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATION / mySQLOPTIMIZATION / mySQL

At this point, the query is optimized almost as well as possible. At this point, the query is optimized almost as well as possible.

The remaining problem is that, by default, MySQL assumes that values in the The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPCtt.ActualPC column are evenly distributed, and that is not the case for the column are evenly distributed, and that is not the case for the tttt table. table.

Fortunately, it is easy to tell MySQL to analyze the key distribution: Fortunately, it is easy to tell MySQL to analyze the key distribution:

ANALYZE TABLE tt; ANALYZE TABLE tt;

With the additional index information, the join is perfect and With the additional index information, the join is perfect and EXPLAIN EXPLAIN produces this produces this result: result:

Page 18: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATION / mySQLOPTIMIZATION / mySQL

table type possible_keys key key_len ref rows Extra table type possible_keys key key_len ref rows Extra

tt ALL AssignedPC NULL NULL NULL 3872 Using ClientID, where ActualPC tt ALL AssignedPC NULL NULL NULL 3872 Using ClientID, where ActualPC

et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1

et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1

do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1

Note that the Note that the rowsrows column in the output from column in the output from EXPLAINEXPLAIN is an educated guess from the is an educated guess from the MySQL join optimizer. MySQL join optimizer.

You should check whether the numbers are even close to the truth by comparing the You should check whether the numbers are even close to the truth by comparing the rowsrows product with the actual number of rows that the query returns. product with the actual number of rows that the query returns.

If the numbers are quite different, you might get better performance by using If the numbers are quite different, you might get better performance by using STRAIGHT_JOINSTRAIGHT_JOIN in your in your SELECTSELECT statement and trying to list the tables in a different statement and trying to list the tables in a different order in the order in the FROMFROM clause. clause.

Page 19: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATION / PostgresOPTIMIZATION / Postgres

These issues do not seem to exist in Postgres.These issues do not seem to exist in Postgres.

EXPLAIN SELECT tt.ActualPC, et.EmployID, tt.ClientIDEXPLAIN SELECT tt.ActualPC, et.EmployID, tt.ClientID

FROM tt, et, et AS et_1, doo WHERE tt.ActualPC = et.EMPLOYIDFROM tt, et, et AS et_1, doo WHERE tt.ActualPC = et.EMPLOYID

AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = doo.CUSTNMBR;AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = doo.CUSTNMBR;

QUERY PLAN QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------

Hash Join (cost=65.49..160.23 rows=1495 width=38)Hash Join (cost=65.49..160.23 rows=1495 width=38)

Hash Cond: (tt.clientid = doo.custnmbr)Hash Cond: (tt.clientid = doo.custnmbr)

-> Hash Join (cost=5.29..77.60 rows=1495 width=38)-> Hash Join (cost=5.29..77.60 rows=1495 width=38)

Hash Cond: (tt.assignedpc = et_1.employid)Hash Cond: (tt.assignedpc = et_1.employid)

-> Hash Join (cost=2.64..54.40 rows=1495 width=49)-> Hash Join (cost=2.64..54.40 rows=1495 width=49)

Hash Cond: (tt.actualpc = et.employid)Hash Cond: (tt.actualpc = et.employid)

-> Seq Scan on tt (cost=0.00..30.59 rows=1659 width=33)-> Seq Scan on tt (cost=0.00..30.59 rows=1659 width=33)

-> Hash (cost=1.73..1.73 rows=73 width=16)-> Hash (cost=1.73..1.73 rows=73 width=16)

-> Seq Scan on et (cost=0.00..1.73 rows=73 width=16)-> Seq Scan on et (cost=0.00..1.73 rows=73 width=16)

-> Hash (cost=1.73..1.73 rows=73 width=16)-> Hash (cost=1.73..1.73 rows=73 width=16)

-> Seq Scan on et et_1 (cost=0.00..1.73 rows=73 width=16)-> Seq Scan on et et_1 (cost=0.00..1.73 rows=73 width=16)

-> Hash (cost=33.98..33.98 rows=2098 width=16)-> Hash (cost=33.98..33.98 rows=2098 width=16)

-> Seq Scan on doo (cost=0.00..33.98 rows=2098 width=16)-> Seq Scan on doo (cost=0.00..33.98 rows=2098 width=16)

Page 20: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATION / PostgresOPTIMIZATION / Postgres

Why so many sequential scans?Why so many sequential scans?

Page 21: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

OPTIMIZATION / PostgresOPTIMIZATION / Postgres

Why so many sequential scans?Why so many sequential scans?

In reality a few thousand records isn’t many. The query optimizer decided the overhead In reality a few thousand records isn’t many. The query optimizer decided the overhead for indexes or hash joins wasn’t worth it.for indexes or hash joins wasn’t worth it.

Page 22: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

SPEEDING UP SELECTSSPEEDING UP SELECTS

When the data stored in a database changes, the statistics used to optimize queries are When the data stored in a database changes, the statistics used to optimize queries are not updated automatically. Therefore, use the ANALYZE command on each table to not updated automatically. Therefore, use the ANALYZE command on each table to speed up results. speed up results.

WHERE CLAUSE OPTIMIZATIONWHERE CLAUSE OPTIMIZATION

First note that any optimizations for the WHERE clause of a SELECT query also work for First note that any optimizations for the WHERE clause of a SELECT query also work for the WHERE clauses of DELETE and UPDATE queries.the WHERE clauses of DELETE and UPDATE queries.

Page 23: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

Examples of very fast queriesExamples of very fast queries

Some examples of queries that are very fast: Some examples of queries that are very fast:

SELECT COUNT(*) FROM SELECT COUNT(*) FROM tbl_nametbl_name; ;

SELECT MIN(SELECT MIN(key_part1key_part1),MAX(),MAX(key_part1key_part1) FROM ) FROM tbl_nametbl_name; ;

SELECT MAX(SELECT MAX(key_part2key_part2) FROM ) FROM tbl_nametbl_name WHERE WHERE key_part1key_part1==constantconstant; ;

SELECT ... FROM SELECT ... FROM tbl_nametbl_name ORDER BY ORDER BY key_part1key_part1,,key_part2key_part2,... LIMIT 10; ,... LIMIT 10;

SELECT ... FROM SELECT ... FROM tbl_nametbl_name ORDER BY ORDER BY key_part1key_part1 DESC, DESC, key_part2key_part2 DESC, ... LIMIT 10; DESC, ... LIMIT 10;

These were actually statements about mySQL, but this should be the same in most These were actually statements about mySQL, but this should be the same in most modern relational databases. modern relational databases.

Page 24: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

INSERTINSERT

The time required for inserting a row is determined by the following factors, where the The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions: numbers indicate approximate proportions:

Connecting: (3) Connecting: (3)

Sending query to server: (2) Sending query to server: (2)

Parsing query: (2) Parsing query: (2)

Inserting row: (1 × size of row) Inserting row: (1 × size of row)

Inserting indexes: (1 × number of indexes) Inserting indexes: (1 × number of indexes)

Closing: (1) Closing: (1)

Page 25: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

INSERTINSERT

If you are inserting many rows from the same client at the same time, use If you are inserting many rows from the same client at the same time, use INSERTINSERT statements with multiple statements with multiple VALUESVALUES lists to insert several rows at a time. This is lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row considerably faster (many times faster in some cases) than using separate single-row INSERTINSERT statements. statements.

When loading a table from a text file, use When loading a table from a text file, use COPY FROMCOPY FROM. This is usually significantly . This is usually significantly faster than using faster than using INSERTINSERT statements. See statements. See .http://www.postgresql.org/docs/8.1/static/sql-copy.html

While outside of the scope of what I wish to cover, if you are loading a large amount of While outside of the scope of what I wish to cover, if you are loading a large amount of data in proportion to the current size of the table, it may be more efficient to drop the data in proportion to the current size of the table, it may be more efficient to drop the indexes, load the file, and then reinstate the indexes.indexes, load the file, and then reinstate the indexes.

Page 26: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

INSERTINSERT

To speed up To speed up INSERTINSERT operations that are performed with multiple statements for non- operations that are performed with multiple statements for non-transactional tables, lock your tables: transactional tables, lock your tables:

BEGIN WORK;BEGIN WORK;

LOCK TABLES IN ACCESS EXCLUSIVE; LOCK TABLES IN ACCESS EXCLUSIVE;

INSERT INTO a VALUES (1,23),(2,34),(4,33); INSERT INTO a VALUES (1,23),(2,34),(4,33);

INSERT INTO a VALUES (8,26),(6,29); INSERT INTO a VALUES (8,26),(6,29);

COMMIT WORK;COMMIT WORK;http://www.postgresql.org/docs/8.1/static/sql-lock.html

There is no LOCK TABLE in the SQL standard, which instead uses SET TRANSACTION to There is no LOCK TABLE in the SQL standard, which instead uses SET TRANSACTION to specify concurrency levels on transactions. PostgreSQL supports that too; see specify concurrency levels on transactions. PostgreSQL supports that too; see SET TRANSACTION for details. for details.

Copying a file to a table is still faster than the method demonstrated above.Copying a file to a table is still faster than the method demonstrated above.

Page 27: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL OPTIMIZATIONSQL OPTIMIZATION

GENERAL TIPSGENERAL TIPS

Use persistent connections to the database to avoid connection overhead. Use persistent connections to the database to avoid connection overhead.

Run ANALYZE AND VACUUM update the statistics and reclaim deleted space after a lot Run ANALYZE AND VACUUM update the statistics and reclaim deleted space after a lot of rows are removed from a table. of rows are removed from a table.

One recommendation I do not agree with:One recommendation I do not agree with:

Try to keep column names simple. For example, in a table named Try to keep column names simple. For example, in a table named customercustomer, use a , use a column name of column name of namename instead of instead of customer_namecustomer_name. I DISAGREE!. I DISAGREE!

To make your names portable to other SQL servers, you should keep them shorter than To make your names portable to other SQL servers, you should keep them shorter than 18 characters. 18 characters.

Page 28: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL TRANSACTIONSSQL TRANSACTIONS

It is often important to ensure a series of statements occur as an atomic unit or do not It is often important to ensure a series of statements occur as an atomic unit or do not occur at all. occur at all.

For example, if you wanted to transfer money from one account to another, you would For example, if you wanted to transfer money from one account to another, you would not want the removal of the funds from one account to occur without the depositing of not want the removal of the funds from one account to occur without the depositing of those funds in the second account. If something happened to prevent the depositing of those funds in the second account. If something happened to prevent the depositing of the funds, then you would want the withdrawal cancelled.the funds, then you would want the withdrawal cancelled.

This is accomplished through the use of transactions.This is accomplished through the use of transactions.

In PostgreSQL, a transaction is set up by surrounding the SQL commands of the In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN and COMMIT commands. So our banking transaction would transaction with BEGIN and COMMIT commands. So our banking transaction would actually look like:actually look like:

The syntax is simple:The syntax is simple:

BEGIN;BEGIN;

Any SQL commands you wish to execute atomicallyAny SQL commands you wish to execute atomically

COMMIT;COMMIT;http://www.postgresql.org/docs/8.3/static/tutorial-transactions.html

Page 29: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

SQL TRANSACTIONSSQL TRANSACTIONS

Note that some statements can not be rolled back. These are typically ones that alter Note that some statements can not be rolled back. These are typically ones that alter the definition of the database/table structure.the definition of the database/table structure.

If statements like these are included within a transaction, then if another statement If statements like these are included within a transaction, then if another statement fails within the transaction, then a full rollback to the beginning of the transaction can fails within the transaction, then a full rollback to the beginning of the transaction can not occur.not occur.

Transactions can be broken up so that you can rollback to a specific point within the Transactions can be broken up so that you can rollback to a specific point within the transaction using the SAVEPOINT command.transaction using the SAVEPOINT command.

SAVEPOINT SAVEPOINT identifieridentifier

ROLLBACK TO SAVEPOINT ROLLBACK TO SAVEPOINT identifieridentifier

RELEASE SAVEPOINT RELEASE SAVEPOINT identifieridentifier

http://www.postgresql.org/docs/8.1/static/sql-savepoint.htmlhttp://developer.postgresql.org/pgdocs/postgres/sql-rollback-to.htmlhttp://www.postgresql.org/docs/8.1/static/sql-release-savepoint.html

Page 30: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

VIEWSVIEWS

A view in SQL is a great way to present information to a user in another way than the A view in SQL is a great way to present information to a user in another way than the logical table structure.logical table structure.

You might do this to limit the access of certain fields, i.e. Social Security Number or You might do this to limit the access of certain fields, i.e. Social Security Number or Date of birth.Date of birth.

You might wish to derive a field. i.e age from date of birth.You might wish to derive a field. i.e age from date of birth.

You might wish to denormalize a series of tables and remove the ID fields so they do You might wish to denormalize a series of tables and remove the ID fields so they do not confuse someone generating reports from the data.not confuse someone generating reports from the data.

Page 31: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

VIEWSVIEWS

The actual syntax has more options than this, but a simple form for creating a view is The actual syntax has more options than this, but a simple form for creating a view is as follows:as follows:

CREATE VIEW CREATE VIEW ViewNameViewName AS AS SelectStatementSelectStatement;;

So if you had the following table:So if you had the following table:

LastNamLastNamee

FirstNameFirstName DOBDOB

JeterJeter DerekDerek 5/6/19745/6/1974

KurtKurt SchillingSchilling 4/4/19614/4/1961

Babe Babe RuthRuth 1/1/19001/1/1900BaseballPlayers table

You could great a view that would show the following:

LastNameLastName FirstNameFirstName AgeAge

JeterJeter DerekDerek 3333

KurtKurt SchillingSchilling 4646

Babe Babe RuthRuth Really OldReally Old

BaseballPlayers view

The BaseballPlayers2 view can be created with the following statement:

CREATE VIEW BaseballPlayers2 AS SELECT LastName, FirstName, Year(DOB) AS Age FROM BaseballPlayers;

http://www.postgresql.org/docs/8.1/static/sql-createview.html

Page 32: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

VIEWSVIEWS

A View name can not be the same name as a table that already exists.A View name can not be the same name as a table that already exists.

Views must have unique column names.Views must have unique column names.

Columns selected can be column names or expressions. If you create an expression, Columns selected can be column names or expressions. If you create an expression, name it!name it!

A view can be created from many kinds of SELECT statements. It can refer to base A view can be created from many kinds of SELECT statements. It can refer to base tables or other views. It can use joins, UNION, and subqueries. tables or other views. It can use joins, UNION, and subqueries.

Currently, views are read only: the system will not allow an insert, update, or delete on Currently, views are read only: the system will not allow an insert, update, or delete on a view. You can get the effect of an updatable view by creating rules that rewrite a view. You can get the effect of an updatable view by creating rules that rewrite inserts, etc. on the view into appropriate actions on other tables. For more information inserts, etc. on the view into appropriate actions on other tables. For more information see see CREATE RULE..

http://www.postgresql.org/docs/8.1/static/sql-createview.html

Page 33: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

VIEWSVIEWS

Use the Use the DROP VIEW statement to drop views. statement to drop views.

Be careful that the names and types of the view's columns will be assigned the way you Be careful that the names and types of the view's columns will be assigned the way you want. For example,want. For example,

CREATE VIEW vista AS SELECT 'Hello World';CREATE VIEW vista AS SELECT 'Hello World';

is bad form in two ways: the column name defaults to ?column?, and the column data is bad form in two ways: the column name defaults to ?column?, and the column data type defaults to unknown. If you want a string literal in a view's result, use something type defaults to unknown. If you want a string literal in a view's result, use something likelike

CREATE VIEW vista AS SELECT text 'Hello World' AS hello;CREATE VIEW vista AS SELECT text 'Hello World' AS hello;

Page 34: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

DATE/TIMEDATE/TIME

WORKING with DATETIME, DATE, and INTERVAL VALUES

Types

DATETIME or TIMESTAMPStructured "real" date and time values, containing year, month, day, hour, minute, second and millisecond for all useful date & time values (4713 BC to over 100,000 AD).

DATESimplified integer-based representation of a date defining only year, month, and day.

INTERVALStructured value showing a period of time, including any/all of years, months, weeks, days, hours, minutes, seconds, and milliseconds. "1 day", "42 minutes 10 seconds", and "2 years" are all INTERVAL values.

Page 35: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

DATE/TIMEDATE/TIME

WORKING with DATETIME, DATE, and INTERVAL VALUES

Which do I want to use: DATE or TIMESTAMP? I don't need minutes or hours in my valueThat depends. DATE is easier to work with for arithmetic (e.g. something reoccurring at a random interval of days), takes less storage space, and doesn't trail "00:00:00" strings you don't need when printed. However, TIMESTAMP is far better for real calendar calculations (e.g. something that happens on the 15th of each month or the 2nd Thursday of leap years). More below.

1. The difference between two TIMESTAMPs is always an INTERVALTIMESTAMP '1999-12-30' - TIMESTAMP '1999-12-11' = INTERVAL '19 days'

2. You may add or subtract an INTERVAL to a TIMESTAMP to produce another TIMESTAMPTIMESTAMP '1999-12-11' + INTERVAL '19 days' = TIMESTAMP '1999-12-30'

3. You may add or subtract two INTERVALSINTERVAL '1 month' + INTERVAL '1 month 3 days' = INTERVAL '2 months 3 days'

Page 36: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

DATE/TIMEDATE/TIME

1. The difference between two DATES is always an INTEGER, representing the number of DAYS differenceDATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19

You may add or subtract an INTEGER to a DATE to produce another DATEDATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'

Page 37: Database Systems – SQL SQL OPTIMIZATION Writing efficient queries requires understanding what effects the performance of your query. In general hardware.

Database Systems – SQLDatabase Systems – SQL

DATE/TIMEDATE/TIMEFunction Description Example Result

age(timestamp, timestamp)Subtract arguments, producing a "symbolic" result that uses years and months

age(timestamp '2001-04-10', timestamp '1957-06-13')

43 years 9 mons 27 days

age(timestamp) Subtract from current_date age(timestamp '1957-06-13') 43 years 8 mons 3 days

current_date Today's date; see Section 9.9.4    

current_time Time of day; see Section 9.9.4    

current_timestamp Date and time; see Section 9.9.4    

date_part(text, timestamp)Get subfield (equivalent to extract); see Section 9.9.1

date_part('hour', timestamp '2001-02-16 20:38:40')

20

date_part(text, interval)Get subfield (equivalent to extract); see Section 9.9.1

date_part('month', interval '2 years 3 months') 3

date_trunc(text, timestamp)Truncate to specified precision; see also Section 9.9.2

date_trunc('hour', timestamp '2001-02-16 20:38:40')

2001-02-16 20:00:00

extract(field fromtimestamp) Get subfield; see Section 9.9.1extract(hour from timestamp '2001-02-16 20:38:40')

20

extract(field from interval) Get subfield; see Section 9.9.1 extract(month from interval '2 years 3 months') 3

isfinite(timestamp) Test for finite time stamp (not equal to infinity) isfinite(timestamp '2001-02-16 21:28:30') true

isfinite(interval) Test for finite interval isfinite(interval '4 hours') true

justify_hours(interval)Adjust interval so 24-hour time periods are represented as days

justify_hours(interval '24 hours') 1 day

justify_days(interval)Adjust interval so 30-day time periods are represented as months

justify_days(interval '30 days') 1 month

localtime Time of day; see Section 9.9.4    

localtimestamp Date and time; see Section 9.9.4    

now()Current date and time (equivalent to current_timestamp); see Section 9.9.4

   

timeofday() Current date and time; see Section 9.9.4