IBM Information Management
© 2005 IBM Corporation
Indexing and Fragmentation Strategies
Informix Chat with the LabsDecember 8, 2005
Mark ScrantonWorldwide Informix Technical [email protected]
IBM Information Management
© 2005 IBM Corporationpage 2
ANNOUNCING:IDUG / IIUG 2006 North America Conference
May 7th-11th – Tampa Convention Center Tampa, Florida, USA
Attend in-depth Informix-specific educational seminars
Hear technical presentations by Informix R&D staff and fellow Informix users
Take advantage of networking opportunities
Visit products & services exhibitions
For more information, go to www.iiug.org/confTo register, go to http://conferences.idug.org/namerica/2006/index.cfm
* Note: All registration is handled by IDUG – The International DB2 User Group (IDUG).
IBM Information Management
© 2005 IBM Corporationpage 3
Who Am I?
WORK HISTORY
1995-Oct 2004Informix & IBM Education Group
– focused exclusively on IDS and XPS
– education and consulting
– user conferences & user groups
Oct 2004 – presentWorldwide Informix Technical Strategist
– User groups/conferences
– Technical proofs/benchmarks
– Product futures & direction
– Management “convincer”
– Customer Visits
– Competitive situations
PUBLICATIONS
Contributor: “The Informix Handbook”
Author: “Bringing IDS Internals to the Surface
IBM Redbook(s)
WEBSITE - www.markscranton.com
Tips, tricks, monthly updates.
Presentations, white papers, scripts
MISCELLANEOUS
Advocacy Director - International Informix Users Group (IIUG – www.iiug.org)
Recipient – IIUG “Directors Award” for 2003
2005 INFORMIX ACTIVITY
Infobahns in 10 countries; ~ 30 cities
User Groups in 40 US cities
IBM Information Management
© 2005 IBM Corporationpage 4
I am available for…
management convincing
client visits
local Informix user groups
technical roundtables
IBM Information Management
© 2005 IBM Corporationpage 5
Get On The List! [email protected]
• includes: •late breaking news!•new technical content!•IFMX-related events!•current wind conditions wherever I am!
send email to [email protected] with SUBSCRIBE in the BODY (not SUBJECT)
IBM Information Management
© 2005 IBM Corporationpage 6
The Informix-Flash Paper
IBM Information Management
© 2005 IBM Corporationpage 7
Inform*Me … www.markscranton.com
IBM Information Management
© 2005 IBM Corporationpage 8
Announcing: “The IDS 10.0 Cities Tour 2006”
targeting 10 large regional US cities
all-day in-depth technical presentations
– including well-known names in select cities
– best practices; IDS internals; performance considerations
– Informix education discounts at each city and giveaways
IBM Information Management
© 2005 IBM Corporationpage 9
IDS Indexing & Fragmentation Strategies
IBM Information Management
© 2005 IBM Corporationpage 10
Preface
“Indexing” and “Fragmentation” each could take days to cover exhaustively
– each are critical to achieving top IDS performance for medium-to-large sites
– a strong IDS foundational knowledge is necessary to cover these topics in-depth
IBM Information Management
© 2005 IBM Corporationpage 11
Preface
in the next hour, I will cover:
– some fundamentals of fragmentation and indexing
– I will handle the topics separately
– significant changes that have occurred in IDS v9.4 and v10.0
NOTE: any reference to a “fragment” or “structure” could mean a data fragment (table data), or and index fragment
IBM Information Management
© 2005 IBM Corporationpage 12
Fragmentation Review
IBM Information Management
© 2005 IBM Corporationpage 13
IDS Fragmentation Review : OLTP
OLTP characteristics:
high volume of short transactions
each transaction accesses a few rows
index access method is used.
For this environment:
Fragment the data by round robin or expression.
For large tables that receive a large percentage of transaction activity fragment the indexes using an expression based fragmentation strategy.
IBM Information Management
© 2005 IBM Corporationpage 14
IDS Fragmentation Review : Data Warehousing
DW characteristics:
low volume of long running queries
queries access most of the rows in each table
very few indexes are generally required
preferred access method is sequential scan
preferred join method is the hash join
For this environment:
• fragment elimination
• parallel scan the needed fragments
IBM Information Management
© 2005 IBM Corporationpage 15
IDS Fragmentation
Tips:
– <database>:sysfragments system catalog has a ton of information on the fragments.
• a “detached index” becomes an entry in sysfragments versus sysindexes.
– round robin fragmentation is terribly easy to implement, but has very few benefits.
– expression-based fragmentation is much more difficult to implement, but the benefits can be superb.
IBM Information Management
© 2005 IBM Corporationpage 16
Fragmentation Fact or Fiction
The primary consideration for when you should fragment a table is when it reaches X rows.
(answer on next slide…)
True or False
IBM Information Management
© 2005 IBM Corporationpage 17
Fragmentation Fact or Fiction
Answer: False
While table size is important, the first two considerations must be:
query behavior & characteristics: fixed/canned or ad-hoc.
knowledge of the data – well-known or always unknown.
these two together will determine the fragmentation scheme, ie: round-robin or expression.
IBM Information Management
© 2005 IBM Corporationpage 18
The First Requirement
do not “wander into” fragmentation lightly.
the more complex the environment, the more homework required to setup effective fragmentation.
Do you know thy queries?
Do you know thy data?
(ok – the First Two requirements!)
IBM Information Management
© 2005 IBM Corporationpage 19
Fragmentation and Extents
CREATE TABLE table_a (x INTEGER, y INTEGER, z CHAR (25))FRAGMENT BY EXPRESSION x <= 10 and x >= 1 in tab_adbs1,x <= 20 and x > 10 in tab_adbs2,x <= 30 and x > 20 in tab_adbs3EXTENT SIZE 120000 NEXT SIZE 60000;
initial extent for each fragment is 12M
tab_adbs1 tab_adbs2 tab_adbs3
dbspaces
IBM Information Management
© 2005 IBM Corporationpage 20
Fragmentation and Tablespaces
application view: one logical table
tablespace = fragment = partition
tblsnum fragid partnum
tab_adbs1 tab_adbs2 tab_adbs3
engine view: 3 structures
IBM Information Management
© 2005 IBM Corporationpage 21
Fragmentation and Tablespaces
each fragment has it’s own partition page in the tblspace tblspace for that dbspace.
– each fragment can hit max extents or max table size.
the PARTNUM in <database>:systables will be “0” (zero) for the fragments
– partnum or fragid stored in <database>:sysfragments
application view: one logical table
tablespace = fragment = partition
tblsnum fragid partnum
tab_adbs1 tab_adbs2 tab_adbs3
engine view: 3 structures
Fun Facts
IBM Information Management
© 2005 IBM Corporationpage 22
IDS v10.0 Enhancement
pre-10.0:
– only one fragment per table in a single dbspace.
10.0+:
– multiple fragments per table in a single dbspace.
– an automatic partitioning* feature is being considered for vNext+ – this will allow it.
* will allow automatic partitioning when a structure reaches max size or pages.
IBM Information Management
© 2005 IBM Corporationpage 23
Fragmentation Objectives
IBM Information Management
© 2005 IBM Corporationpage 24
Fragmentation Objectives
scan threads *
fragments
ParallelismFragments are accessed in parallel, decreasing scan or insert time.
* INSERTs, UPDATEs, SELECTs can also be done in parallel
Fragment EliminationUnneeded fragments are eliminated, decreasing scan or insert time, and reducing disk contention.
scan threads *
fragments
scan threads *
fragments
Fragment Elimination & ParallelismBoth goals are achieved.
XX XX
XX
IBM Information Management
© 2005 IBM Corporationpage 25
Fragmentation Objectives : Fragment Elimination cannot be done in:
– !=, IS NULL, IS NOT NULL
can be done in:
– the fetch portion of INSERT, UPDATE, SELECT or DELETE - when the SQL statements are optimized
– nested-loop joins – after key value from outer table is retrieved, elimination can occur when searching the inner table
– IN, =, <, >, =>, <=, AND, OR, NOT, MATCH, LIKE
– range expressions combined with !=, IS NULL, IS NOT NULL
IBM Information Management
© 2005 IBM Corporationpage 26
Fragmentation Objectives : Fragment Elimination
Below shows conditions when the optimizer can or cannot eliminate fragments
NOTE: there are more slides on this topic in the “Reference Material” section.
IBM Information Management
© 2005 IBM Corporationpage 27
Fragmentation Objectives: Parallelism
X number of fragments accessed in parallel.
can cause device contention
– but the completion speed of the operation could outweigh that concern
default access scheme for “round robin” fragmentation
fragments cannot be eliminated with round robin 1 toy; 2 grandchildren - being
entertained in parallel*.
Parallelism in the Workplace
* elimination is not appropriate.
IBM Information Management
© 2005 IBM Corporationpage 28
Indexes: Attached & DetachedSemantic Clarification
IBM Information Management
© 2005 IBM Corporationpage 29
Fragmentation Fact or Fiction
If you fragment your table data, and create an index on that table, it becomes fragmented by default.
(answer on next slide…)
True or False
IBM Information Management
© 2005 IBM Corporationpage 30
Fragmentation Fact or Fiction
Answer: True
If you issue a CREATE INDEX… without specifying a storage clause/fragmentation scheme, the index is fragmented to follow the data into the respective dbspaces.
Note that the index pages are not interleaved with the data pages in the table extents – they have their own extents within the appropriate dbspace. This will be covered in the next section.
IBM Information Management
© 2005 IBM Corporationpage 31
Indexing: Fundamentals – Attached & Detached
7.3 and before: default was “attached”.
– data pages are interleaved with index pages within an extent.
– index fragment(s) will always be in the same dbspace(s) as the table fragment(s).
– an index fragment will only point to data in the table fragment occupying the same dbspace
9.2+: default is “detached”
– index pages are in their own extent(s)
– index fragments can be in the same or different dbspace than the data
– there is some confusion about the meaning of “detached”
IBM Information Management
© 2005 IBM Corporationpage 32
Detached/Attached Indexes
create table …; onpload…; create index …;
BM data
data
data index
index
data
index
dbspace1
extent
v7
BM data
data
data data
data
data
data
dbspace2
data extent
v9.2+ default behavior
BM index
index
index index
index
index
index
index extent
index
index
index
index
“attached” “attached or detached”???
data data
IBM Information Management
© 2005 IBM Corporationpage 33
Historical View of Detached
create table …;onpload…;create index …in <dbspace>;
BM data
data
data data
data
data
data
dbspace1
extent
v7 AND v9.2+
dbspace2
BM index
index
index index
index
index
index
index extent
index
index
index
index
“attached” “detached”
create table …;onpload…;create index …fragment by…;
OR
IBM Information Management
© 2005 IBM Corporationpage 34
Index Fragmentation
create table …fragment by…;onpload…;create index …;
BM data
data
data data
data
data
data
dbspace1
data extent
v7 AND v9
BM index
index
index index
index
index
index
index extent
index
index
index
index
“attached or detached”???
IBM Information Management
© 2005 IBM Corporationpage 35
Index Fragmentation Examples
IBM Information Management
© 2005 IBM Corporationpage 36
Attached Index on a Fragmented Table
Large table DSS or OLTP environment.
Attractive index parallel scans.
Attractive index fragment elimination and smaller btrees.
Attractive scans on data pages in parallel.
Balanced I/O for indexes and data pages.
IBM Information Management
© 2005 IBM Corporationpage 37
Detached Fragmented Index on a Non-fragmented Table
OLTP environment with high index hits vs. data page hits (key only reads).
Attractive index scans in parallel
Attractive index lookups with fragment elimination and smaller btrees.
Unattractive scans on data pages in series.
IBM Information Management
© 2005 IBM Corporationpage 38
Detached Index on a Fragmented Table
DSS environment with some selective queries.
Attractive scans on data pages in parallel.
Unattractive index read in series.
IBM Information Management
© 2005 IBM Corporationpage 39
Detached Fragmented Index on a Fragmented Table
Mixed OLTP and DSS environments with data fragmented for DSS and index fragmented of OLTP or Selective queries and non-selective queries on different columns in a DSS environment.
Attractive index parallel scans.
Attractive index fragment elimination and smaller btrees.
Attractive scans on data pages in parallel.
Balanced I/O for indexes and data pages.
IBM Information Management
© 2005 IBM Corporationpage 40
Indexing: Historical Issues w/ Btree Cleaner
v7.x – 9.3
– pages that were freed and reused could confuse the B-tree cleaner
– complex code required to invalidate requests
– single list caused contention
– single B-tree cleaner can get overwhelmed with large workloads
– no priority in cleaner requests
– long lists of committed deleted items left a bloated index
– a single btree cleaner would cause bloated indexes
RESULT: frequent rebuilds were necessary for efficiency
IBM Information Management
© 2005 IBM Corporationpage 41
Enter Btree Scanners (9.4)
The workload for cleaning indexes will be prioritized
– the index which causes the server to do the most work will be the next index cleaned
An index will have its leaf level examined looking for deleted items
Dynamic configuration of threads to allow for configurable workloads
– can be added/dropped on-the-fly and tuned.
IBM Information Management
© 2005 IBM Corporationpage 42
Indexing: v10.0 enhancements
Configurable Page Sizes
– allows wider indexes
• index rows cannot be split across pages• page sizes from 2K through 16K
3000 byte index limit allows:
– wider indexes
– expanded UNICODE support
IBM Information Management
© 2005 IBM Corporationpage 43
IDS v10.0 Enhancement – Online Index Build
The [ CREATE | DROP ] INDEX ... ONLINE statement allows the creation/dropping of an index without having an exclusive lock placed on the table during the duration of the index build.
You can use the CREATE INDEX … ONLINE statement even when reads or updates are occurring on the table. This means index creation can begin immediately.
– If you use this syntax to create an index on a table that other users are accessing, the index is not available until no user is updating the table.
– After you issue the new syntax to drop an index, no one can reference the index, but current DML operations can use the index until they terminate.
– Dropping the index is deferred until no user is using the index.
IBM Information Management
© 2005 IBM Corporationpage 44
New Considerations
IBM Information Management
© 2005 IBM Corporationpage 45
The Ever-Changing Engine
OLTP Engine
DW Engine
OLTP DW
The Hybrid Engine - many engines now are not exclusively OLTP or DW. Now an abundance of resources are available.
Two+ Engines – specific to OLTP or DW by engine or shift. Mostly due to limited resources.O
ld S
choo
lN
ew S
choo
l
IBM Information Management
© 2005 IBM Corporationpage 46
What’s New with Queries?
OLTP queries
– return more rows than before
– sequential scans may be a preferred method due to result set size
– more resources required for OLTP
DW queries
– many clients are moving their DWing to IDS
• IDS can handle many environments that only XPS could before
IBM Information Management
© 2005 IBM Corporationpage 47
Warning! Warning!
Some true stories from the road…
– watch data growth as disk is more plentiful
• once had a client with 66,000+ extents
– don’t congest your engine as you add horsepower
• had a client that was trying to run 18,000+ reports in what was originally an OLTP engine
IBM Information Management
© 2005 IBM Corporationpage 48
IDS 10.0 Enhancement : Configurable Page Sizes
rootdbs2K pgD
ISK
ME
MO
RY
Buffer Cache2K pages
dbspace32K pg
dbspace116K pg
Buffer Cache16K pages
dbspace216K pg
dbspace48K pg
Buffer Cache8K pages
Benefit: will allow appropriate cache sizing and page sizing for large table/indexes.
IBM Information Management
© 2005 IBM Corporationpage 49
IDS 10.0 Enhancement : External Optimizer Directives
SAVE EXTERNAL DIRECTIVES /*+ AVOID_INDEX (table1 index1)*/ , /*+ FULL(table1) */ ACTIVE FOR
SELECT col1, col2 FROM table1, table2 WHERE table1.col1 = table2.col1
This associates AVOID_INDEX and FULL directives with the specified query.
The inline INDEX directive is ignored by the optimizer when the external directives are applied to a query that matches the SELECT statement.
Benefit: will allow influencing of canned or closed queries – both OLTP or DW.
IBM Information Management
© 2005 IBM Corporationpage 50
IDS v10.0 Enhancement : Memory Allocation for non-PDQ Queries
You can specify how much memory is allocated to non-PDQ queries.
– The default of 128K can be insufficient for queries that specify ORDER BY, GROUP BY, hash joins, or other memory-intensive options.
Use the new configuration parameter, DS_NONPDQ_QUERY_MEM, to specify more memory than the 128K that is allocated to non-PDQ queries by default.
Benefit: will allow DBA to give appropriate memory to OLTP queries (non-PDQ) without setting PDQ or disrupting the PDQ environment.
IBM Information Management
© 2005 IBM Corporationpage 51
IDS v10.0 Enhancement: Dynamic OPTCOMPIND
You can use SET ENVIRONMENT OPTCOMPIND to set OPTCOMPIND environment variable dynamically for the current session.
The value that you enter using this statement takes precedence over the current setting specified in the ONCONFIG file.
The default setting of the OPTCOMPIND environment variable is restored when your current session terminates.
Benefit: will allow DBA to have more control over execution of OLTP and DW queries in the same engine.
IBM Information Management
© 2005 IBM Corporationpage 53
IBM Information Management
© 2005 IBM Corporationpage 54
Reference Material
IBM Information Management
© 2005 IBM Corporationpage 55
Fragmentation Objectives : Fragment Elimination
Non-overlapping fragments on a single column
– can eliminate in both equality and range expressions.
– preferred method/scheme for elimination.
IBM Information Management
© 2005 IBM Corporationpage 56
Fragmentation Objectives : Fragment Elimination
Overlapping fragments on a single column
– can eliminate in equality but not a range search.
IBM Information Management
© 2005 IBM Corporationpage 57
Fragmentation Objectives :Fragment Elimination
Non-overlapping fragments on multiple columns
– can eliminate in equality but not a range search.
Top Related