AM18 ASA INTERNALS: DATA MANAGEMENT

87
AM18 ASA INTERNALS: DATA MANAGEMENT GLENN PAULLEY, DEVELOPMENT MANAGER paulley @ ianywhere .com AUGUST 2005

description

AM18 ASA INTERNALS: DATA MANAGEMENT. GLENN PAULLEY, DEVELOPMENT MANAGER [email protected] AUGUST 2005. Goals of this presentation. Overview of data management and query processing in Adaptive Server Anywhere 9.0.2 Concentrate on performance issues and problem areas - PowerPoint PPT Presentation

Transcript of AM18 ASA INTERNALS: DATA MANAGEMENT

Page 2: AM18 ASA INTERNALS: DATA MANAGEMENT

2

Goals of this presentation

Overview of data management and query processing in Adaptive Server Anywhere 9.0.2

Concentrate on performance issues and problem areas

Provide an overview of SQL Anywhere 9.0 technology

Highlight planned features for the Jasper release

Agenda

Section One: SQL language support, data management

Section Two: query execution and optimization

Page 3: AM18 ASA INTERNALS: DATA MANAGEMENT

3

Design goals of SQL Anywhere Studio

Ease of administration

Good out-of-the-box performance

“Embeddability” features self-tuning

Cross-platform support

Interoperability

Page 4: AM18 ASA INTERNALS: DATA MANAGEMENT

4

Motivation for the ASA 9.0 release

Exploit the new architecture of 8.0 and add support for additional language features, including GROUP BY ROLLUP

RECURSIVE UNION

Window functions and other OLAP support

XML

Table Functions

INTERSECT and EXCEPT

ORDER BY, SELECT TOP N in any query block, including views

Improve performance

Page 5: AM18 ASA INTERNALS: DATA MANAGEMENT

5

Highlights of the ASA 9.0 releases

HTTP serverASA Index ConsultantImproved performance, scalability better scalability in OLTP environments

Query processing improvements optimization refinements – particularly with the server’s cost model

histograms modified according to update DML statements

alternate, efficient execution methods for complex queries

SNMP support 9.0.1 EBF build 1828, Windows platforms only

Formally part of the 9.0.2 release

Page 6: AM18 ASA INTERNALS: DATA MANAGEMENT

6

Performance, performance, performance

Version comparison, 10GB DB, Minutes

-1.0

1.0

3.0

5.0

7.0

9.0

11.0

13.0

15.0

7.0.4.2788 14.6 1.1 1068. 20.7 52.8 1.0 515.2 90.2 825.1 29.1 16.1 12.8 177.8 3.8 1.2 2.9 8.3 227.3 1500. 1500. 1500. 1500. 412.2

8.0.0.2065 7.7 1.0 8.1 6.8 7.9 2.7 672.7 9.2 717.9 13.6 1.9 6.5 13.5 2.5 4.9 5.2 6.0 1500. 1500. 1500. 1500. 1500. 408.6

9.0.0.1073 4.6 2.6 3.1 2.4 3.3 1.0 3.2 3.4 6.2 3.5 0.7 2.4 3.7 0.3 0.5 2.6 4.7 14.1 3.2 1.5 8.9 0.9 3.5

9.0.1.1751 4.2 0.7 5.7 1.9 2.8 1.2 3.3 2.9 4.7 2.5 0.5 1.9 1.5 0.4 1.5 1.5 2.2 6.7 2.3 1.9 6.6 0.7 2.6

10.0.1212 3.8 0.6 2.2 1.7 2.4 1.0 2.9 2.5 4.2 2.0 0.5 1.8 1.5 0.3 0.6 1.2 1.4 4.5 1.9 1.7 5.8 1.1 2.1

Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Avg

Page 7: AM18 ASA INTERNALS: DATA MANAGEMENT

7

Contents

Language Support

New SQL constructs supported with 9.0.1

Data Management in 9.0.1

Database organization

Table storage organization

Index storage organization

Physical database design tips

Jasper features

Page 8: AM18 ASA INTERNALS: DATA MANAGEMENT

8

New SQL language support in 9.0.1

Table functions (SELECT over a stored procedure)

ORDER BY clause now supported in all SELECT blocks

Necessary to support SELECT TOP n in derived tables, views, and subqueries with correct semantics

RECURSIVE UNION (bill-of-materials) queries

INTERSECT and EXCEPT query expressions

LATERAL keyword for derived tables

Now necessary for derived tables or table expressions containing outer references

WITH clause (common table expressions)

Essentially in-lined view definitions

Page 9: AM18 ASA INTERNALS: DATA MANAGEMENT

9

New SQL language support in 9.0.1

SELECT TOP n START AT m

Equivalent functionality to that in MySQL, Postgres n and m can be variables or host variables

WITH INDEX hint in FROM clause

Named CHECK, PK, FK, UNIQUE constraints

Constraint violation message refers to the constraint name New catalog tables:

SYSCONSTRAINT contains information about all constraints, even referential integrity constraints

SYSCHECK contains the body of the CHECK constraint; now permit multiple CHECK constraints on the same column(s) Specific CHECK constraint that is violated appears in error

Not available in older database formats, even if DBUPGRAD is used

Page 10: AM18 ASA INTERNALS: DATA MANAGEMENT

10

New SQL language support in 9.0.1

OLAP support

VARIANCE, STD_DEV aggregate functions ORDER BY clause for LIST aggregate function GROUP BY

ROLLUP, CUBE, GROUPING SETS Binary set functions (linear regression, co-variance, etc.) Rank functions Windowed aggregate functions

Construct “moving average” results in a single SQL statement

Support for multiple DISTINCT aggregate functions in a single SELECT block

Necessitates the use of Hash Group By

Page 11: AM18 ASA INTERNALS: DATA MANAGEMENT

11

New SQL language support in 9.0.1

Support for SET statement in Transact-SQL dialect stored procedures

Implemented for MS SQL Server compatibility EXECUTE IMMEDIATE extensions

Procedures can now use EXECUTE IMMEDIATE to execute dynamically-constructed queries which return a result set

WITH ESCAPES ON | OFF WITH QUOTES ON | OFF

Variable assignment permitted in UPDATE statements (8.0.1)

SELECT INTO base-table

Page 12: AM18 ASA INTERNALS: DATA MANAGEMENT

12

New SQL language support in 9.0.1

FOR XML AUTO, FOR XML RAW, FOR XML EXPLICIT, OPENXML procedure (supports XPATH queries over XML column values)

SQLX functionality: xmlelement(), xmlforest(), xmlgen(), xmlconcat(), and xmlagg()

EXPRTYPE() function – outputs the type of the expression argument

Useful when defining computed columns LOCATE() can handle negative offsets

INSERT WITH AUTO NAME (8.0.2)

Page 13: AM18 ASA INTERNALS: DATA MANAGEMENT

13

Table functions

Result set description determined from the catalog; result set must match exactly

Otherwise SQLSTATE ‘WP012’ Workaround: use the WITH clause to annotate the procedure

reference in the FROM clause:

SELECT * FROM PROC() WITH( X Integer, Y char(17) )

SELECT * FROM SYS.SYSTABLE as st, sa_table_fragmentation() as tbfrg WHERE st.table_name = tbfrg.tablename

Page 14: AM18 ASA INTERNALS: DATA MANAGEMENT

14

Table functions

Procedure may return only one result set

Statistics regarding cost, result set cardinality of the procedure are captured at run time; used for subsequent requests

Statistics are stored in SYS.SYSPROCEDURE Minimally requires DBUPGRAD of older databases to 9.0.0

Page 15: AM18 ASA INTERNALS: DATA MANAGEMENT

15

Recursive UNION

SQL-2003 implementation of recursive (bill-of-materials) queries

Only DB2 also offers RECURSIVE UNION support; Oracle implements a ‘cycle’ clause

Uses specialized join operators: recursive hash inner and outer joins will utilize a nested-loop strategy if inputs are small; done adaptively at

run-time during query execution

WITH RECURSIVE r (level, emp_id, manager_id) as (SELECT 1, emp_id, manager_id FROM employee WHERE emp_id = manager_id UNION ALLSELECT level+1, e.emp_id, e.manager_id FROM employee e JOIN r ON (e.manager_id = r.emp_id)WHERE e.emp_id <> e.manager_id and level < 3)SELECT * FROM r

Page 16: AM18 ASA INTERNALS: DATA MANAGEMENT

16

Recursive UNION: restrictions

Query expression must be UNION ALL Recursive reference must be in a query block that does not

contain DISTINCT, aggregation, or an ORDER BY clause Recursive reference in a LEFT OUTER JOIN is permitted

Schema of WITH clause must match recursive query Implicit type conversions involving truncation can yield undesired

results; SQLSTATE 42WA2 returned if server detects a type mismatch

Use CAST to ensure compatible types Infinite queries are possible; server kills the query after N

recursions controlled by the new connection option

MAX_RECURSIVE_ITERATIONS (default 100)

Page 17: AM18 ASA INTERNALS: DATA MANAGEMENT

17

INTERSECT and EXCEPT

Implement set/bag difference and set/bag intersection

Both ALL and DISTINCT variants are supported; DISTINCT performed by default

Form query expressions in the same fashion as UNION

NULL treated as a special value in each domain, hence NULLs are equivalent to each other

Useful when formulating queries that require counting of identical rows

See the help for order-of-precedence amongst the set operators

Page 18: AM18 ASA INTERNALS: DATA MANAGEMENT

18

EXCEPT and INTERSECT ALL

Rewrite to transform ALL to DISTINCT done automatically by the optimizer

Both EXCEPT and INTERSECT can be computed through either a merge or hashing technique

Also supports an (expensive) nested-loop strategy in case a cache shortage is encountered

With ALL variants:

implicitly performs aggregation to count the number of duplicate rows in each input

A new query execution operator, ROW REPLICATE, generates the required copies of each row

SELECT description FROM product EXCEPT ALLSELECT description FROM product as p2 WHERE quantity < 15

Page 19: AM18 ASA INTERNALS: DATA MANAGEMENT

19

GROUP BY ROLLUP

Computes aggregates as usual, but result set contains multiple sets of groups

Logically, grouping is performed N+1 times for N grouping expressions

Essentially implements the functionality of COBOL Report Writer in a single SQL request

SELECT state, zip, count(*), grouping(zip), grouping(state)FROM customerGROUP BY ROLLUP (state, zip)

Page 20: AM18 ASA INTERNALS: DATA MANAGEMENT

20

GROUP BY CUBE

Computes aggregates as usual, but result set contains the power set of the N grouping expressions

Expensive to execute for large N

Result can be restricted through the specification of GROUPING SETS

SELECT state, zip, count(*), grouping(zip), grouping(state)FROM customerGROUP BY CUBE (state, zip)

SELECT state, zip, count(*), grouping(zip), grouping(state)FROM customerGROUP BY GROUPING SETS ( (state, zip), state, zip, () )

Page 21: AM18 ASA INTERNALS: DATA MANAGEMENT

21

WINDOW functions

Part of SQL OLAP extensions

Computes aggregates (except LIST) over a window of rows

Provides an ANSI-compliant way to number the rows of a result set

ROW_NUMBER() rather than NUMBER(*) Useful to:

Compute cumulative aggregates, or “moving averages” Eliminate the need for correlated subqueries involving aggregation

Page 22: AM18 ASA INTERNALS: DATA MANAGEMENT

22

WINDOW functions

List employees, by department, in four US states by their start dates, along with their cumulative salaries:

SELECT dept_id, emp_lname, start_date, salary, SUM(salary) OVER (PARTITION BY dept_id ORDER BY start_date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS “Sum_Salary" FROM employee WHERE state IN ('CA', 'UT', 'NY', 'AZ') AND dept_id IN ('100', '200')ORDER BY dept_id, start_date;

Page 23: AM18 ASA INTERNALS: DATA MANAGEMENT

23

WINDOW functions

List all orders (with part information) where the part quantity cannot cover the maximum single order for that part:

SELECT order_qty.id, o.order_date, p.*, max_qFROM ( SELECT s.id, s.prod_id, MAX(s.quantity) OVER (partition BY s.prod_id order by s.prod_id) AS max_q FROM sales_order_items s) as order_qty, product p, sales_order oWHERE p.id = prod_id and o.id = order_qty.id and p.quantity < max_qORDER BY p.id, o.id

SELECT o.id, o.order_date, p.*FROM sales_order o, sales_order_items s, product pWHERE o.id = s.id and s.prod_id = p.id and p.quantity < (SELECT max(s2.quantity) FROM sales_order_items s2 WHERE s2.prod_id = p.id)ORDER BY p.id, o.id

Page 24: AM18 ASA INTERNALS: DATA MANAGEMENT

24

WINDOW functions

Find the salespeople with the best sales (total amount) for each product, including ties:

SELECT v.prod_id, v.sales_rep, v.total_quantity, v.total_sales FROM ( SELECT o.sales_rep, s.prod_id, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_sales, RANK() OVER (PARTITION BY s.prod_id ORDER BY SUM(s.quantity * p.unit_price) DESC) as sales_rankingFROM sales_order o KEY JOIN sales_order_items s KEY JOIN product pGROUP BY o.sales_rep, s.prod_id ) as vWHERE sales_ranking = 1ORDER by v.prod_id

SELECT s.prod_id, o.sales_rep, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_salesFROM sales_order o KEY JOIN sales_order_items s KEY JOIN product pGROUP BY s.prod_id, o.sales_repHAVING total_sales = (SELECT FIRST SUM(s2.quantity * p2.unit_price) as sum_sales FROM sales_order o2 KEY JOIN sales_order_items s2 KEY JOIN product p2 WHERE s2.prod_id = s.prod_id GROUP BY o2.sales_rep ORDER BY sum_sales DESC )ORDER BY s.prod_id

Page 25: AM18 ASA INTERNALS: DATA MANAGEMENT

25

Data Management in 9.0.2

Page 26: AM18 ASA INTERNALS: DATA MANAGEMENT

26

Moving to ASA 9.0.2

If database is 8.0.2, unload/reload to 9.0 is largely unnecessary

DBUPGRAD to 9.0 required for some catalog schema changes, in particular for the Index Consultant

There should be no consequences of using DBUPGRAD with respect to performance

However:

only 9.0 format databases support named constraints only 9.0 format databases support cache warming only 9.0.1 databases support page checksums 8.0.2 databases do not support index statistics collection by

default Can be turned on when creating the database via CREATE

DATABASE (but not dbinit)

Page 27: AM18 ASA INTERNALS: DATA MANAGEMENT

27

Moving to ASA 9.0.2

Otherwise, unload/reload from 8.0.1 or 8.0.0 recommended Clustered index support Better statistics management

Improved histogram organization, statistics collection Index statistics kept persistent in the database file

Improved histograms Cache warming on startup Checksums on database pages PCTFREE option for base and temporary tables

Page 28: AM18 ASA INTERNALS: DATA MANAGEMENT

28

Moving to SQL Anywhere “Jasper”

The Jasper release of the SQL Anywhere server will not support older database formats

Jasper will ship with a migration tool to convert an existing database into a Jasper-format database

Page 29: AM18 ASA INTERNALS: DATA MANAGEMENT

29

Database organization

A database consists of up to 13 “dbspaces” Maximum size of each dbspace is limited by the underlying operating

system Maximum database size is also determined by page size

Limit for any dbspace is 2**28 (256 million) pages Each dbspace, the temporary file, and the transaction log is a simple

OS file Ease of administration, backup Temporary file is used for temporary tables

A dbspace file grows in 256K extents (512K if 16K pages, 1Mb if 32K pages)

Database files can be copied to/from different endian machines Can copy database from Wintel to big-endian UNIX systems and back

again Server automatically does data conversion where necessary

Page 30: AM18 ASA INTERNALS: DATA MANAGEMENT

30

Database organization

A database file contains:

table pages index pages free pages rollback log pages checkpoint log pages

Each dbspace for a database must use the same page size

Page 31: AM18 ASA INTERNALS: DATA MANAGEMENT

31

Physical organization: tables

Each table uses an independent set of table pages

Each table allocates at least one page, even if the table is empty Server maintains bit-maps for table pages

Supports clustering of table pages in the same portion of the database file

Facilitates large-block I/O – SQL Anywhere reads 64K at a time when doing sequential scans

Result: considerably faster sequential scan performance

Page 32: AM18 ASA INTERNALS: DATA MANAGEMENT

32

Physical organization: tables

New in 8.0.2: ‘scattered read’ support on Windows 2000 and Windows XP Another mainframe technology being reinvented on PC/UNIX

servers aka “locate-mode I/O”

Improves performance, reduces memory requirements Coming to other platforms as vendors implement it

Tables cannot span dbspaces Each secondary index on a table can be stored in a separate

dbspace Recommended if multiple spindles are available (not necessary

for RAID devices) Partition dbspaces on separate devices whenever possible

Brings more disk arms to bear, reducing seek latency

Page 33: AM18 ASA INTERNALS: DATA MANAGEMENT

33

Physical organization: tables

Rows are inserted into pages at a point where, if at all possible, the entire row can be stored contiguously

Caveat: row segments are at most 4K; second or subsequent row segments can appear on different pages

Columns are packed tightly together; only unpadded values are stored on disk

Primary key columns are always at the beginning of each row, in sequence

Server may rewrite all rows if PK added or modified Rows can be of (almost) unlimited size; are split across pages

where necessary

Maximum length of any column is 2Gb Maximum number of rows per page is 255

Page 34: AM18 ASA INTERNALS: DATA MANAGEMENT

34

Physical organization: tables

Rows are not guaranteed to be placed in pages corresponding to their insertion order

By default, ASA uses a first-fit algorithm for page selection To guarantee ordering of a result set, specify an ORDER BY clause

Space is not reserved for columns that are null

BLOB values are stored in a separate “arena” of pages

First 255 bytes are stored together with the row Access to the rest of the BLOB value will almost certainly require a SEEK Implications for choice of page size

Once inserted, a row identifier is immutable

An updated row must be split if its new length does not allow it to fit on the page

Page 35: AM18 ASA INTERNALS: DATA MANAGEMENT

35

Physical organization: tables

Table pages are allocated in 8 page clusters; cluster allocation depends on page size

2K: grow 4 clusters at a time 4K: grow 2 clusters at a time All other page sizes: one cluster at a time

ASA will re-use database pages for additional inserts if entire pages are freed

Defaults: for 1K pages, free space is 100 bytes; all other page sizes is 200 bytes

DBA can specify freespace percentage to accommodate future table UPDATEs using PCTFREE

PCTFREE characteristic stored in new catalog table SYSATTRIBUTE (and corresponding table SYSATTRIBUTENAME)

Can be specified for temporary tables

Page 36: AM18 ASA INTERNALS: DATA MANAGEMENT

36

Page sizes

Page sizes supported are 1K, 2K, 4K, 8K, 16K, 32K 2K page size minimum on all UNIX platforms

Default changed to 2K in the 6.0.3 release A server can support several databases concurrently

Buffer pool page size will be the largest database page size specified on the command line

Consider tradeoffs with your choice of page size 4K recommended; occasionally 8K may offer improved

performance Default will change to 4K with Jasper release

Do not use 16K or 32K pages unless you have a specialty application In typical environments, large page sizes cause inefficient use of cache

Page 37: AM18 ASA INTERNALS: DATA MANAGEMENT

37

Choice of page size does matter

Larger rows usually require larger pages (requires fewer split rows)

Random retrieval performance is dependent on the application

Larger pages can pollute the cache with unnecessary data Often require larger buffer pools to accommodate the

application’s working set Smaller pages are more cache efficient, but Smaller pages reduce index fanout, and can increase index depth

Page 38: AM18 ASA INTERNALS: DATA MANAGEMENT

38

Choice of page size does matter

Don’t ignore index maintenance costs when considering page size (larger page sizes can mean increased cache pressure)

Test your application with different alternatives

Your mileage may vary A 4K page size is a typical choice for many applications My recommendation: use 4K pages unless thorough testing

proves that a different page size offers better performance/scalability

See data storage whitepaper

Available at www.ianywhere.com/developer Recently updated for 9.0.0

Page 39: AM18 ASA INTERNALS: DATA MANAGEMENT

39

Physical organization: indexes

ASA 9.0 supports two different types of indexes:

Hash-based

Key is a one-way order-preserving encoding of at most nine bytes of the data values

Hash-based indexes are still used when the key length does not satisfy the limits for compressed indexes

Compressed

Contains Patricia tries in the index’s internal nodes Used for keys > 10 bytes and less than

122 bytes with 1K pages 248 bytes for all other page sizes

Substantially improved performance with larger keys

Page 40: AM18 ASA INTERNALS: DATA MANAGEMENT

40

Physical index organization: hash-based indexes

Values in an index are “hashed” into a key of at most 10 bytes using an order-preserving encoding function

WITH HASH SIZE is deprecated Each indexed column encoded separately, with a one-byte

length

A 10-byte hash value can hold two 32-bit integer values (including two length bytes)

Hash values in an index are stored separately from the index entry itself

The hash value for an identical secondary key is shared for each index entry (row) in that index page

This improves fanout when data distribution is skewed

Page 41: AM18 ASA INTERNALS: DATA MANAGEMENT

41

Physical index organization: Compressed indexes

Internal nodes in the index contain a Patricia trie

PATRICIA: Practical Algorithm to Retrieve Information Coded in Alphanumeric (D. R. Morrison, J. ACM Vol. 15, 1968)

Combines a binary trie with an optimization to skip over bit comparisons that would result from one-way branching

Result: automatic compression of string data

Excellent fanout of internal nodes

Common substrings of key values have a negligible impact on space requirements and performance

Superb performance improvements in many cases, especially with composite primary and foreign keys

Page 42: AM18 ASA INTERNALS: DATA MANAGEMENT

42

Clustered index support

First offered with the 8.0.2 release

At most one clustered index per table (may be a temporary table)

May be secondary index, PK, FK, UNIQUE constraint

Optimizer assumes PK indexes are clustered unless a different clustering index exists

Engine will not attempt to maintain clustering on PK indexes unless they are declared CLUSTERED

May be hash or compressed index

Clustering characteristic stored in SYSATTRIBUTE catalog table

CLUSTERED keyword can be used in both CREATE INDEX and CREATE/ALTER TABLE statements

However, ALTER does not reorganize the table; use REORGANIZE TABLE

Page 43: AM18 ASA INTERNALS: DATA MANAGEMENT

43

Clustered index support

On INSERT/LOAD TABLE, server attempts to keep rows physically adjacent in base table pages Specification of PCTFREE on LOAD can be critical

Adjacency is NOT guaranteed; ORDER BY still requires a physical sort or indexed retrieval

Can significantly improve performance Optimizer costs clustered index access differently

Consider their use with queries that involve range predicates Often useful with DATE or TIMESTAMP columns

Use REORGANIZE TABLE or UNLOAD/RELOAD if clustering degrades over time

ALTER INDEX statement can rename an index or change its clustering attribute

Page 44: AM18 ASA INTERNALS: DATA MANAGEMENT

44

Physical index organization: fanout and page size

Fanout refers to the number of index entries on a page

Lower fanout means greater index depth, and hence more costly random retrieval

Fanout is affected by

Page size Hash value size/trie compression Distribution of key values Index maintenance

Fanout can degrade over time

sa_index_density() procedure

Page 45: AM18 ASA INTERNALS: DATA MANAGEMENT

45

Indexes and query processing

ASA does not store actual data values in the index

implies each base row must be retrieved to Fetch the values of any attributes, or To compare keys longer than the maximum hash value size

Indexes are automatically created to enforce referential integrity

Primary keys, foreign keys, unique constraints All related indexes must be the same type (hash or compressed)

Maximum number of indexes is dependent on page size

<= 4K: 2048 indexes 8K: 1024 indexes 16K: 512 indexes 32K: 256 indexes

Page 46: AM18 ASA INTERNALS: DATA MANAGEMENT

46

Indexes and query processing

Each indexed column can be ascending or descending

Index is scanned backwards if the application scrolls in the opposite direction, or an ORDER BY clause specifies the reverse sequence

Support for merge and hash joins means that ASA will often use sequential scans, rather than indexed retrieval

Page 47: AM18 ASA INTERNALS: DATA MANAGEMENT

47

REORGANIZE Statement – base tables

REORGANIZE TABLE tablename

Defragments rows on-the-fly by removing/inserting groups of rows in clustered index (or PK) order

Exclusive lock held on the table while a group is processed; commits occur periodically to enable other applications to run, checkpoints are suspended while the group is being processed

Performs implicit COMMITs during operation Rows will be in clustered sequence when operation is

complete (except possibly concurrent UPDATES)

Use new procedure sa_table_fragmentation() to discover tables that warrant reorganization

Page 48: AM18 ASA INTERNALS: DATA MANAGEMENT

48

REORGANIZE Statement - indexes

REORGANIZE TABLE tablename [ index specification ]

INDEX indexname FOREIGN KEY indexname PRIMARY KEY

Exclusive lock is held throughout

CHECKPOINTs are suspended

Reclaims space lost to update activity

Re-balances the index, especially important after many DELETE operations

Use the new procedure sa_index_density() to identify indexes that require reorganization

Page 49: AM18 ASA INTERNALS: DATA MANAGEMENT

49

Data management improvements in 9.0.1

Better scalability – new lock-free cache manager

Substantially better performance across the board Support for page checksums

New option for dbinit and CREATE DATABASE statement Supported by dbvalid utility, and a new statement VALIDATE

CHECKSUM Overhead: largely depends on CPU speed. Examples:

2.8 milliseconds per I/O for 32K pages 0.7 milliseconds per I/O for 8K pages

Improvements to dynamic cache sizing

Sampling rate changes with database growth or the starting of a new database on the same server

Page 50: AM18 ASA INTERNALS: DATA MANAGEMENT

50

Data management improvements in 9.0.1

Database cache warming feature

Two operational phases, collection and reload During collection, page IDs are saved in the database as they are

accessed at startup During reload, collected page IDs are read into cache as

background processing Checks and balances used to prevent swamping the server with I/O

during server startup Need to test performance before deploying

Cache warming is *enabled* by default

Page 51: AM18 ASA INTERNALS: DATA MANAGEMENT

51

Data management improvements in 9.0.1

Optimistic locking introduced for WAIT_FOR_COMMIT Controlled by a new connection option

OPTIMISTIC_WAIT_FOR_COMMIT Temporary dbspace can be grown with ALTER DBSPACE

Can improve performance of complex queries by ensuring that the temp file is not fragmented on disk

Size of temporary dbspace can be controlled with a governor New public option TEMP_SPACE_LIMIT_CHECK (default OFF)

When OFF, engine’s default behaviour is to die with a DISK FULL error Jasper release: default is ON

Server computes a temp space quota for each request; if quota is exceeded and temporary dbspace is at least 80% of its maximum size, request fails with SQLSTATE 54W05

Quota computed using amount of disk free space on that partition, and number of active connections

Shipped in 9.0.0 build 1308, 9.0.1 build 1872, 8.0.3 build 4991

Page 52: AM18 ASA INTERNALS: DATA MANAGEMENT

52

Data management improvements in 9.0.1

ALTER INDEX statement

Can rename an index, or alter its clustering attribute Ability to create an index on a function

Automatically adds a computed column “column-name” to the table

Creates an index on the computed column Relies on the optimizer to replace any function occurrences with

the computed column

CREATE INDEX index-name ON [owner.]table-name ( function( arg [, ...] ) AS column-

name ) [{IN | ON} dbspace-name]

Page 53: AM18 ASA INTERNALS: DATA MANAGEMENT

53

Data management improvements in 9.0.1

Non-transactional temporary tables

Unaffected by COMMIT or ROLLBACK; no entries made to rollback log

Procedure, trigger, and view text can be hidden from other users by using SET HIDDEN (8.0.2)

LOAD TABLE enhancements:

can be used on local temporary tables (8.0.2) ORDER clause (8.0.2) Control over which column histograms are built (9.0.0)

Page 54: AM18 ASA INTERNALS: DATA MANAGEMENT

54

Data management improvements in 9.0.1

DEDICATED_TASK option (DBA-only, temporary only) UUIDs and GUIDs can be used as surrogate keys - see

newid() function (8.0.2) XML data type SYSHISTORY system table Statistics (depth, leaf pages) maintained on indexes in real

time (introduced in 8.0.2EBF) Hash(), compress(), encrypt() builtin functions

Can be used to compress or encrypt individual string or binary fields in the database

Values can be viewed, processed with decrypt() and decompress() functions

Page 55: AM18 ASA INTERNALS: DATA MANAGEMENT

55

Data management improvements in 9.0.1

ALTER DATABASE can now modify transaction log identically to DBLOG utility

BACKUP and DBBACKUP can now rename the log copy ALTER VIEW WITH RECOMPILE Event handling improvements:

Two new parameters for event_parameter: APPINFO DisconnectReason: ‘from client’, ‘drop connection’, ‘liveness’, ‘inactive’,

‘connect failed’ New cost model for Ultralite requests

New DTT function based on analysis of several current models of pocket PC devices

Equates random and sequential I/O to produce better Ultralite query plans

Page 56: AM18 ASA INTERNALS: DATA MANAGEMENT

56

Data management improvements in 9.0.2

Temporary stored procedures they are visible only by the connection which creates them, and

are automatically dropped when the connection is dropped. they can be explicitly dropped, but may not be ALTERed. GRANT and REVOKE are not permitted on temporary

procedures. they are not recorded in the catalog or in the transaction log they can be created and dropped when connected to a read-only

database a procedure owner cannot be specified for temporary procedures.

Rather, they are owned by the user that creates them. temporary external procedures are not permitted temporary procedures execute with the permissions of their

creator (i.e. the current user)

Page 57: AM18 ASA INTERNALS: DATA MANAGEMENT

57

Data management improvements in 9.0.2

CREATE LOCAL TEMPORARY TABLE defines a local temporary table which will persist until the end of a connection, or

until the table is explicitly dropped. Intended for use inside procedures, functions, triggers

Similar to DECLARE LOCAL TEMPORARY table if executed outside of a procedure context

UUIDs are now a native SQL Anywhere type UUID_HAS_HYPHENS option

Controls formatting of UUIDs (UniqueIdentifier values) when converted to strings Disk-full callback support MIN_TABLE_SIZE_FOR_HISTOGRAM is deprecated New option COLLECT_STATISTICS_ON_DML_UPDATES New option LOG_DEADLOCKS, sa_report_deadlocks() procedure Enhancements to START DATABASE statement: WITH DISTINCT

SQLSTATE

Page 58: AM18 ASA INTERNALS: DATA MANAGEMENT

58

Application profiling improvements in 9.0.2

Procedure profiling can now be performed for an individual connection or user

call sa_server_option('Profile_connection',<connection-id>) call sa_server_option('ProfileFilterUser','<userid>')

Request-level logging enhancements:

New –zn switch to retain n log files in a ring Or use sa_server_option('RequestLogNumFiles',<n>)

Can log either text or the plan for expensive queries (9.0.2EBF) -zx <cost> specifies the threshold cost, which if exceeded at either

optimization or execution time the statement is logged Call sa_server_option(‘LogExpensiveQueries’)

When –zp is also specified, the plans are output; otherwise, only the statement text is logged

Page 59: AM18 ASA INTERNALS: DATA MANAGEMENT

59

Physical database design tips

Page 60: AM18 ASA INTERNALS: DATA MANAGEMENT

60

Physical database design tips: file placement

Database file placement

Place transaction log, database file(s), and temporary directory on separate devices if possible

if using mirrored logging, ensure the two logs are on different physical disks

Temporary file placement can dramatically affect performance of complex queries

Use the ASTMP environment variable to specify location for temporary file

Place on a different physical drive if possible The more disk heads the better (RAID)

Page 61: AM18 ASA INTERNALS: DATA MANAGEMENT

61

Physical database design tips: file placement

Consider the use of caching disk controllers/NT striping/RAID

Consider the tradeoffs Software striping offers better performance, but offers no recovery

advantages RAID 5 tends to have poor write request latency: each I/O turns

into four write requests that take place serially Not good for a transaction log

RAID 10 (1+0) offers much better performance, at the cost of redundancy

Page 62: AM18 ASA INTERNALS: DATA MANAGEMENT

62

File system considerations

Defragment your file system occasionally, especially after an unload/reload

Database file fragmentation is now displayed in the console window when the database is started

Preallocate large quantities of space in contiguous chunks through the ALTER DBSPACE command

Less problematic with 256K block allocation in recent ASA releases ALTER DBSPACE <dbspace-name> INSERT nnn {PAGES | KB | MB |

GB | TB} Can also do this for the TEMPORARY DBSPACE

Use db_extended_property() function to determine fragmentation/size of each dbspace individually (new in 9.0, also in 8.0.2.4215)

Can be done for temporary dbspace and the transaction log as well

Page 63: AM18 ASA INTERNALS: DATA MANAGEMENT

63

File system considerations

Use caution when trying to run the database over a networked drive!

Not all networks and/or operating systems guarantee network packet ordering Physical or logical corruption is likely

Can use “-r” (read-only) switch if necessary SAN units are supported; they guarantee consistent semantics

Do not use cached filesystem writes unless persistence is guaranteed

Corruption is virtually certain and database cannot be recovered; will need to restore database from backup

Page 64: AM18 ASA INTERNALS: DATA MANAGEMENT

64

Database fragmentation

ASA databases never shrink

Free pages will be reused for other purposes Unload/reload will recover this unused space

If data is removed in the order it was inserted, fragmentation is less likely

Avoid inserts of NULL values followed by updates with actual data use PCTFREE if necessary

Repair fragmentation with unload/reload, or REORGANIZE TABLE

Useful tools

DBINFO -u stored procedure sa_table_fragmentation()

Page 65: AM18 ASA INTERNALS: DATA MANAGEMENT

65

Physical database design tips: tables

Load table data in clustering order (by default, primary key sequence)

Sorting automatically performed by DBUNLOAD and by the REORGANIZE TABLE statement

New ORDER syntax for (UN)LOAD TABLE Use 4K pages unless conditions warrant

Watch for ordering, placement of PK columns

Order in table dictates order in index Changed in Jasper!

Rows are rewritten if PK columns, or column order, is changed

Page 66: AM18 ASA INTERNALS: DATA MANAGEMENT

66

Physical database design tips: tables

Use of out-of-range default values instead of NULL

Reduces page fragmentation with updates Can use PCTFREE as an alternative

Put large columns at end of row; fixed-size and frequently-accessed columns near start

Prevent seeks to another table page, required to access split rows

Choose your data types with care; tradeoff storage efficiency with application requirements

For keys, alphanumeric strings are often more flexible

Page 67: AM18 ASA INTERNALS: DATA MANAGEMENT

67

Physical database design tips: indexes

Compressed indexes prevent many of the problems with relatively large or composite primary keys

However:

Surrogate keys can still be useful Usually not a good idea for significant business objects to have

the same key format Self-checking keys can simplify business processing

Watch for opportunities to specify a clustering index

Especially with date or timestamp columns used in range queries Useful stored procedures:

sa_index_levels() sa_index_density()

Page 68: AM18 ASA INTERNALS: DATA MANAGEMENT

68

Physical database design tips: surrogate keys

Consider surrogate keys when appropriate

Exploit autoincrement support, or develop self-checking keys to simplify error detection

9.0 and 8.0.2 support automatic generation of universal unique identifiers (UUIDs) as surrogate keys

Compatible with Microsoft’s implementation New native domain: uniqueidentifier in 9.0.2 No longer necessary to use string conversion functions such as

uuidtostr(); type conversion done automatically Tradeoff their characteristics with GLOBAL

AUTOINCREMENT

Page 69: AM18 ASA INTERNALS: DATA MANAGEMENT

69

Physical database design tips: foreign keys

Foreign keys are essential to the optimization of complex queries

Join selectivity and cardinality estimation is much more accurate when foreign key constraints are present

Also enable a variety of query rewrite optimizations But tradeoff using declarative referential integrity

Downside is the maintenance cost for indexes that are not utilized in query processing

In rare situations, consider eliminating some RI and check constraints once application is fully tested

Page 70: AM18 ASA INTERNALS: DATA MANAGEMENT

70

Physical database design tips: triggers, constraints

Use declarative referential integrity instead of triggers

Use CHECK constraints rather than triggers for simple conditions

9.0 supports named constraints Unnamed constraints are automatically named as ‘ASAnnn’

Mark columns as NOT NULL when appropriate

Don’t over-use CHECK constraints

e.g. in user-defined data types Using a user-defined function in a CHECK constraint will

guarantee poor update performance

Page 71: AM18 ASA INTERNALS: DATA MANAGEMENT

71

Server configuration tips: cache size

Dynamic cache sizing is instituted by default on platforms that support it Not supported for CE, Netware Can override dynamic cache sizing as necessary Server can dynamically adjust cache size depending on server workload;

this is more robust in 9.0.1 Use –ch to specify an upper bound larger than 256MB

If specifying cache size at startup: Need to allow for OS and application overhead CE has different defaults than other platforms Java-enabled databases require a larger minimum cache for the Java VM

- 8Mb usually sufficient Watch for NT File Cache competition

See white paper on memory usage (available at http://www.ianywhere.com/developer)

Page 72: AM18 ASA INTERNALS: DATA MANAGEMENT

72

Data management in Jasper

Statements concerning iAnywhere Solutions' new products are forward-looking statements that involve a number of uncertaintiesand risks and cannot be guaranteed. Factors that could ultimately affect such statements are detailed from time to time in Sybase's Securities and Exchange Commission filings, including but notlimited to its annual report on Form 10-K and its quarterly reports on Form 10-Q (copies of which can be viewed on the Company's website).-----------------------------------------------------All of the information in this presentation are forward-lookingstatements, as defined above. As such, there is uncertaintyassociated with if or when any of these features will be added to theproduct.

Page 73: AM18 ASA INTERNALS: DATA MANAGEMENT

73

Data management changes in Jasper

Default page size changed to 4K

New catalog implementation

Catalog base tables have been renamed All catalog access by applications is through views Catalog base tables are reorganized, more efficient View dependencies on base tables and views are now tracked

Improved storage organization for BLOB columns

In-row BLOB prefix default is no longer fixed at 254: CHAR/VARCHAR: minimum 8, maximum 128 BINARY/VARBINARY: minimum 0, maximum 256 can override on per-column basis

New storage architecture for long values, permits efficient random access

Page 74: AM18 ASA INTERNALS: DATA MANAGEMENT

74

View dependency tracking

Three states for any view:

Valid: compiled and active, can be utilized in queries Invalid: view has been invalidated by the server due to

dependency checking as a result of DDL on base tables Upon reference, the server will attempt to compile the view and use it

if possible Otherwise, query will get an error

Disabled: view has been explicitly disabled (via new statement, DISABLE VIEW), and is unusable View must be explicitly enabled in order to become valid (via new

statement, ENABLE VIEW)

Page 75: AM18 ASA INTERNALS: DATA MANAGEMENT

75

View dependency tracking

Upon an ALTER (or DROP): Server attempts to acquire an exclusive lock on the object to be

modified Server honours the current setting of the BLOCKING option

Server then acquires exclusive locks on all dependent views If any lock cannot be acquired, the statement gets an error

Once locked, all dependent views are invalidated ALTER (or DROP) statement is executed With ALTER, the server attempts to revalidate all the previously

invalidated views Views successfully recompiled are marked as valid Otherwise, the view is left in the invalid state

Server will attempt to recompile it when First referenced in a server session, or When other DDL is performed that may affect that view

Page 76: AM18 ASA INTERNALS: DATA MANAGEMENT

76

Internationalization improvements

Support for NCHAR data type

NCHAR strings are stored as UTF-8 NCHAR specification and functions use character semantics, not byte

semantics NCHAR(10) means 10 characters (1-4 bytes per character)

CHAR specification now supports either BYTE or CHAR modifier E.g. CHAR(10 BYTE) or CHAR(23 CHAR)

NCHAR can support either UCA (Unicode Collation Algorithm) using IBM’s ICU library

Properly supports multi-byte character sorting

A legacy collation stored as UTF-8 Database now can have two collations, one for NCHAR, one for CHAR

Details in session SQL506 Monday afternoon

Page 77: AM18 ASA INTERNALS: DATA MANAGEMENT

77

Indexing changes

New index implementation Improved implementation of compressed B-tree indexes Key values are duplicated in the index to support index-only retrieval and

snapshot isolation Older “hash”-based indexes have been dropped entirely Index column order for primary keys now based on PK constraint

declaration, not column order in table PK can be altered, reordered without rewriting all the rows in the table Order specification can now be specified with any constraint index

e.g. PRIMARY KEY (X ASC, Y DESC, Z ASC) Foreign key column order can now be different than that of PK

All indexes now appear in the SYSINDEXES view Planned:

Ability to declare that a FK is unique (to enforce a 1:1 relationship) Abstract indexes into logical and physical implementations

Redundant indexes will not be created

Page 78: AM18 ASA INTERNALS: DATA MANAGEMENT

78

Shareable global temporary tables

Shared global temporary tables

New syntax: CREATE GLOBAL TEMPORARY TABLE ….. SHARE BY ALL

The contents of the table will persist until explicitly deleted or until the database is shut down. On database startup, the table will be empty.

Row locking on shared temporary tables behaves the same as for permanent tables

Inserts, updates and deletes on shared temporary tables are not recorded in the transaction log

Column statistics are maintained in memory by the server.

Page 79: AM18 ASA INTERNALS: DATA MANAGEMENT

79

Data management changes in Jasper

Last modification time for any row in a table now retained in SYSTABLE

Resolution is one second LOAD TABLE enhancements: better performance, ENCODING

option, ROW DELIMITED BY option

Apply multiple transaction logs at startup (can specify a directory)

Better row-level locking implementation

Elimination of key-range locking with anti-insert locks Planned: introduction of INTENT locks (e.g. FETCH FOR UPDATE)

Improved administration of large databases:

Parallel backup Auto-tuning to exploit multiple CPU’s on SMP hardware

Faster unload/reload, index creation, database validation

Page 80: AM18 ASA INTERNALS: DATA MANAGEMENT

80

Database mirroring

Provides “hot” failover for a SQL Anywhere database Involves two or three separate servers: primary, mirror, arbiter Transaction log pages are passed from the primary server to the mirror to

keep the mirror up-to-date Mirror server is not accessible by any other connections

Effectively the mirror server is in continuous recovery mode Log pages can be passed in three modes:

Synchronously (default) on COMMIT Asynchronously on COMMIT – better performance than synchronous mode Asynchronously when log page is full, with a timeout option

Async implies the usual caveats with possible lost transactions

Role switch occurs if primary server fails Arbiter used to verify the mirror state before role switch proceeds Clients are disconnected from the primary server

Must reconnect to the mirror See Techwave session SQL508 – High Availability ASA on Wednesday

Page 81: AM18 ASA INTERNALS: DATA MANAGEMENT

81

Snapshot isolation support

Provides read-consistency in the face of concurrent writes from other transactions (e.g. writers do not block readers)

Enabled by a global database option, allow_snapshot_isolation Three new transaction isolation levels:

“snapshot” – cleanest semantics, transaction sees a consistent view of the database as of transaction start (the time the first row was accessed)

“stmt-snapshot” – requires less resources, however each statement sees a consistent state of the database but at different times Only one snapshot time exists for a connection; outermost or first statement sets the

transaction time “read-only-stmt-snapshot” – like stmt-snapshot, but only for queries; update

statements execute at isolation level 1 Usage is not free

Old copies of rows are maintained in a “row version store” (part of the database’s temp file) for as long as necessary to ensure consistency for any transaction

Indexes have a mix of “old” and “current” values Can affect the performance of both sequential and index scans

Page 82: AM18 ASA INTERNALS: DATA MANAGEMENT

82

Snapshot isolation support

Setting the isolation level: set transaction isolation level snapshot set transaction isolation level statement snapshot set transaction isolation level read only statement snapshot

Or within an ODBC application, use SA_SQL_TXN_SNAPSHOT SA_SQL_TXN_STATEMENT_SNAPSHOT SA_SQL_TXN_READ_ONLY_STATEMENT_SNAPSHOT

Update conflicts are still possible Isolation levels can be mixed (but not recommended) Database property VersionStorePages contains the number of pages in

the temp file devoted to copies of old rows BLOB values do not reside in the temp file, but remain in the main database

file and are reference counted Some restrictions on DDL when snapshot transactions are in progress

(ALTER TABLE, etc.)

Page 83: AM18 ASA INTERNALS: DATA MANAGEMENT

83

Lazy CHECKPOINTs

A Jasper server can now initiate a CHECKPOINT and perform other operations while it takes place.

In previous releases, all database activity would stop while the CHECKPOINT took place.

There can only be one CHECKPOINT in progress at a time.

If a CHECKPOINT is already in progress, then any operation like an ALTER TABLE or CREATE INDEX that wants to initiate a new CHECKPOINT needs to wait for the last one to finish.

Lazy checkpoints are not used if using the –m option

Documented by START CHECKPOINT and FINISH CHECKPOINT records in the transaction log

Page 84: AM18 ASA INTERNALS: DATA MANAGEMENT

84

Application profiling and request-level logging

Major enhancements in the Jasper release

Unified logging architecture Can log data to a database, rather than a flat file Can log data to a different database, even on another server

Much lower overhead

Considerably greater detail in diagnostic information Lock contention Statements within stored procedures and triggers Elapsed times Query plans

Planned improvements to DBCONSOLE for real-time server status

Attend sessions SQL501/514 Tuesday afternoon at 1:30 ASA Performance Analysis from Start to Finish

Page 85: AM18 ASA INTERNALS: DATA MANAGEMENT

85

iAnywhere at TechWave 2005

Ask the iAnywhere Experts on the Technology Boardwalk (exhibit hall)• Drop in during exhibit hall hours and have all your questions answered by our technical

experts!• Appointments outside of exhibit hall hours are also available to speak one-on-one with our

Senior Engineers. Ask questions or get your yearly technical review – ask us for details!

TechWave ToGo Channel• TechWave ToGo, an AvantGo channel providing up-to-date information about TechWave

classes, events, maps and more –now available via your handheld device! • www.ianywhere.com/techwavetogo

iAnywhere Developer Community - A one-stop source for technical information!Access to newsgroups,new betas and code samples• Monthly technical newsletters• Technical whitepapers,tips and online product documentation• Current webcast,class,conference and seminar listings• Excellent resources for commonly asked questions• All available express bug fixes and patches • Network with thousands of industry experts

http://www.ianywhere.com/developer/

Page 86: AM18 ASA INTERNALS: DATA MANAGEMENT

86

SQL Anywhere ‘Jasper’ Release

Learn more about 'Jasper', the upcoming SQL Anywhere release, loaded with features focused on:

• Enhanced data management including performance, data protection, and developer productivity

• Innovative data movement including manageability, flexibility and performance, and messaging

Attend the following sessions:SQL Anywhere 'Jasper' New Feature Overview Session SQL512 will be held Monday, August 22nd, 1:30pm

MobiLink 'Jasper' New Feature Overview Session SQL515 will be held Wednesday, August 24th, 1:30pm

... and remember to look for sneak peeks in other sessions and morning education courses!

Register for the Jasper Beta program: www.ianywhere.com/jasper

Page 87: AM18 ASA INTERNALS: DATA MANAGEMENT

87

Questions

?