Five Tuning Tips For Your Data Warehouse Jeff Moss
Slide 2
My First Presentation Yes, my very first presentation For BIRT
SIG For UKOUG Useful Advice from friends and colleagues Use
graphics where appropriate Find a friendly or familiar face in the
audience Imagine your audience is naked! but like Oracle, be
careful when combining advice!
Slide 3
Be Careful Combining Advice! Thanks for the opportunity
Mark!
Slide 4
Agenda My background Five tips Partition for success Squeeze
your data with data segment compression Make the most of your PGA
memory Beware of temporal data affecting the optimizer Find out
where your query is at Questions
Slide 5
My Background Independent Consultant 13 years Oracle experience
Blog:
http://oramossoracle.blogspot.com/http://oramossoracle.blogspot.com/
Focused on warehousing / VLDB since 1998 First project UK Music
Sales Data Mart Produces BBC Radio 1 Top 40 chart and many more 2
billion row sales fact table 1 Tb total database size Currently
working with Eon UK (Powergen) 4Tb Production Warehouse, 8Tb total
storage Oracle Product Stack
Slide 6
What Is Partitioning ? Partitioning addresses key issues in
supporting very large tables and indexes by letting you decompose
them into smaller and more manageable pieces called partitions.
Oracle Database Concepts Manual, 10gR2 Introduced in Oracle 8.0
Numerous improvements since Subpartitioning adds another level of
decomposition Partitions and Subpartitions are logical
containers
Slide 7
Partition To Tablespace Mapping Partitions map to tablespaces
Partition can only be in One tablespace Tablespace can hold many
partitions Highest granularity is One tablespace per partition
Lowest granularity is One tablespace for all the partitions
Tablespace volatility Read / Write Read Only P_JAN_2005 P_FEB_2005
P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005
P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005
T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006
T_Q3_2005 Read / WriteRead Only
Slide 8
Why Partition ? - Performance Improved query performance
Pruning or elimination Partition wise joins Read only partitions
Quicker checkpointing Quicker backup Quicker recovery but it
depends on mapping of: partition:tablespace:datafile JAN FEB MAR
APR MAY JUN JUL AUG SEP OCT NOV DEC SELECT SUM(sales) FROM part_tab
WHERE sales_date BETWEEN 01-JAN-2005 AND 30-JUN-2005 Sales Fact
Table * Oracle 10gR2 Data Warehousing Manual
Slide 9
Why Partition ? - Manageability Archiving Use a rolling window
approach ALTER TABLE ADD/SPLIT/DROP PARTITION Easier ETL Processing
Build a new dataset in a staging table Add indexes and constraints
Collect statistics Then swap the staging table for a partition on
the target ALTER TABLEEXCHANGE PARTITION Easier Maintenance Table
partition move, e.g. to compress data Local Index partition
rebuild
Slide 10
Why Partition ? - Scalability Partition is generally consistent
and predictable Assuming an appropriate partitioning key is used
and data has an even distribution across the key Read only approach
Scalable backups - read only tablespaces are ignored so partitions
in those tablespaces are ignored Pruning allows consistent query
performance
Slide 11
Why Partition ? - Availability Offline data impact minimised
depending on granularity Quicker recovery Pruned data not missed
EXCHANGE PARTITION Allows offline build Quick swap over P_JAN_2005
P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005
P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005
T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006
P_MAR_2006 T_Q3_2005 Read / WriteRead Only
Slide 12
Fact Table Partitioning Transaction Date 07-JAN-2005Customer
109-JAN-2005 15-JAN-2005Customer 217-JAN-2005 January Partition
February Partition 22-JAN-2005Customer 301-FEB-2005
02-FEB-2005Customer 405-FEB-2005 26-FEB-2005Customer 528-FEB-2005
March Partition 06-MAR-2005Customer 207-MAR-2005
12-MAR-2005Customer 315-MAR-2005 Tran DateCustomerLoad Date April
Partition 21-JAN-2005Customer 704-APR-2005 09-APR-2005Customer
910-APR-2005 Load Date Easier ETL Processing Each load deals with
only 1 partition No use to end user queries! Cant prune Full scans!
Harder ETL Processing But still uses EXCHANGE PARTITION Useful to
end user queries Allows full pruning capability 07-JAN-2005Customer
109-JAN-2005 15-JAN-2005Customer 217-JAN-2005 21-JAN-2005Customer
704-APR-2005 22-JAN-2005Customer 301-FEB-2005 January Partition
February Partition 02-FEB-2005Customer 405-FEB-2005
26-FEB-2005Customer 528-FEB-2005 March Partition
06-MAR-2005Customer 207-MAR-2005 12-MAR-2005Customer 315-MAR-2005
Tran DateCustomerLoad Date April Partition 09-APR-2005Customer
910-APR-2005
Slide 13
Watch out for Partition exchange and table statistics 1
Partition stats updated but Global stats are NOT! Affects queries
accessing multiple partitions Solution Gather stats on staging
table prior to EXCHANGE Gather stats on partitioned table using
GLOBAL Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter
2
What Is Data Segment Compression ? Compresses data by
eliminating intra block repeated column values Reduces the space
required for a segment but only if there are appropriate repeats!
Self contained Lossless algorithm
Slide 16
Where Can Data Segment Compression Be Used ? Can be used with a
number of segment types Heap & Nested Tables Range or List
Partitions Materialized Views Cant be used with Subpartitions Hash
Partitions Indexes but they have row level compression IOT External
Tables Tables that are part of a Cluster LOBs
Slide 17
How Does Segment Compression Work ? Database Block Symbol Table
Row Data Area 100Call to discuss bill amountTELNOYES 3TEL 4NO
5YES2Call to discuss bill amount 110012345 101Call to discuss new
productMAILNON/A 8MAIL 9N/A 7Call to discuss new product 6101 67849
102Call to discuss new productTELYESN/A 107359 102 ID DESCRIPTION
CONTACT TYPE OUTCOME FOLLOWUP
Slide 18
Pros & Cons Pros Saves space Reduces LIO / PIO Speeds up
backup/recovery Improves query response time Transparent To readers
and writers Decreases time to perform some DML Deletes should be
quicker Bulk inserts may be quicker Cons Increases CPU load Can
only be used on Direct Path operations CTAS Serial Inserts using
INSERT /*+ APPEND */ Parallel Inserts (PDML) ALTER TABLEMOVE Direct
Path SQL*Loader Increases time to perform some DML Bulk inserts may
be slower Updates are slower
Slide 19
Ordering Your Data For Maximum Benefits Colocate data to
maximise compression benefits For maximum compression Minimise the
total space required by the segment Identify most compressable
column(s) For optimal access We know how the data is to be queried
Order the data by Access path columns Then the next most
compressable column(s) 12345 1234512345 12345 11112 2223333444
45555 Uniformly distributed Colocated
Slide 20
Get Max Compression Order Package PROCEDURE
mgmt_p_get_max_compress_order Argument Name Type In/Out Default?
------------------------------ ----------------------- ------
-------- P_TABLE_OWNER VARCHAR2 IN DEFAULT P_TABLE_NAME VARCHAR2 IN
P_PARTITION_NAME VARCHAR2 IN DEFAULT P_SAMPLE_SIZE NUMBER IN
DEFAULT P_PREFIX_COLUMN1 VARCHAR2 IN DEFAULT P_PREFIX_COLUMN2
VARCHAR2 IN DEFAULT P_PREFIX_COLUMN3 VARCHAR2 IN DEFAULT BEGIN
mgmt_p_get_max_compress_order(p_table_owner =>
AE_MGMT,p_table_name =>BIG_TABLE,p_sample_size =>10000); END:
/ Running mgmt_p_get_max_compress_order...
----------------------------------------------------------------------------------------------------
Table : BIG_TABLE Sample Size : 10000 Unique Run ID: 25012006232119
ORDER BY Prefix:
----------------------------------------------------------------------------------------------------
Creating MASTER Table : TEMP_MASTER_25012006232119 Creating COLUMN
Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table
3: COL3
----------------------------------------------------------------------------------------------------
The output below lists each column in the table and the number of
blocks/rows and space used when the table data is ordered by only
that column, or in the case where a prefix has been specified,
where the table data is ordered by the prefix and then that column.
From this one can determine if there is a specific ORDER BY which
can be applied to to the data in order to maximise compression
within the table whilst, in the case of a a prefix being present,
ordering data as efficiently as possible for the most common access
path(s).
----------------------------------------------------------------------------------------------------
NAME COLUMN BLOCKS ROWS SPACE_GB ==============================
============================== ============ ============ ========
TEMP_COL_001_25012006232119 COL1 290 10000.0022
TEMP_COL_002_25012006232119 COL2 345 10000.0026
TEMP_COL_003_25012006232119 COL3 555 10000.0042
Slide 21
Data Warehousing Specifics Star Schema compresses better than
Normalized More redundant data Focus on Fact Tables and Summaries
in Star Schema Transaction tables in Normalized Schema Performance
Impact 1 Space Savings Star schema: 67% Normalized: 24% Query
Elapsed Times Star schema: 16.5% Normalized: 10% 1 - Table
Compression in Oracle 9iR2: A Performance Analysis
Slide 22
Things To Watch Out For DROP COLUMN is awkward ORA-39726:
Unsupported add/drop column operation on compressed tables
Uncompress the table and try again - still gives ORA-39726! After
UPDATEs data is uncompressed Performance impact Row migration Use
appropriate physical design settings PCTFREE 0 - pack each block
Large blocksize - reduce overhead / increase repeats per block
Slide 23
PGA Memory: What For ? Sorts Standard sorts [SORT] Buffer
[BUFFER] Group By [GROUP BY (SORT)] Connect By [CONNECT-BY (SORT)]
Rollup [ROLLUP (SORT)] Window [WINDOW (SORT)] Hash Joins
[HASH-JOIN] Indexes Maintenance [IDX MAINTENANCE SOR] Bitmap Merge
[BITMAP MERGE] Bitmap Create [BITMAP CREATE] Write Buffers [LOAD
WRITE BUFFERS] Serial Process PGA Dedicated Server Cursors
Variables Sort Area [] V$SQL_WORKAREA.OPERATION_TYPE
Slide 24
PGA Memory Management: Manual The old way of doing things Still
available though even in 10g R2 Configuring ALTER SESSION SET
WORKAREA_SIZE_POLICY=MANUAL; Initialisation parameter:
WORKAREA_SIZE_POLICY=MANUAL Set memory parameters yourself
HASH_AREA_SIZE SORT_AREA_SIZE SORT_AREA_RETAINED_SIZE
BITMAP_MERGE_AREA_SIZE CREATE_BITMAP_AREA_SIZE Optimal values
depend on the type of work 1 One size does not fit all! 1 -
Richmond Shee: If Your Memory Serves You Right
Slide 25
PGA Memory Management: Automatic The new way from 9i R1 Default
OFF in 9i R1/R2 Enabled by setting at session/instance level:
WORKAREA_SIZE_POLICY=AUTO PGA_AGGREGATE_TARGET > 0 Default ON
since 10g R1 Oracle dynamically manages the available memory to
suit the workload But of course, its not perfect! Advanced
Management Of Working Areas In Oracle 9i/10g, presented at UKOUG
2005 Joe Seneganik - Advanced Management Of Working Areas In Oracle
9i/10g, presented at UKOUG 2005
Slide 26
Auto PGA Parameters: Pre 10gR2 WORKAREA_SIZE_POLICY Set to AUTO
PGA_AGGREGATE_TARGET The target for summed PGA across all processes
Can be exceeded if too small Over Allocation _PGA_MAX_SIZE Target
maximum PGA size for a single process Default is a fixed value of
200Mb Hidden / Undocumented Parameter Usual caveats apply
Slide 27
Auto PGA Parameters : Pre 10gR2 _SMM_MAX_SIZE Limit for a
single workarea operation for one process Derived Default LEAST(5%
of PGA_AGGREGATE_TARGET, 50% of _PGA_MAX_SIZE) Hits limit of 100Mb
When PGA_AGGREGATE_TARGET is >= 2000Mb And _PGA_MAX_SIZE is left
at default of 200Mb Hidden / Undocumented Parameter Usual caveats
apply
Slide 28
Auto PGA Parameters : Pre 10gR2 _SMM_PX_MAX_SIZE Limit for all
the parallel slaves of a single workarea operation Derived Default
30% of PGA_AGGREGATE_TARGET Hidden / Undocumented Parameter Usual
caveats apply Parallel slaves still limited _SMM_MAX_SIZE Impacts
only when Session 1 100Mb PGA_AGGREGATE_TARGET: 3000Mb
_PGA_MAX_SIZE = 200Mb _SMM_MAX_SIZE = 100Mb _SMM_PX_MAX_SIZE =
900Mb Session 2 100Mb Session 3 100Mb Session 4 100Mb Session 5
100Mb Session 6 100Mb Session 7 100Mb Session 8 100Mb Session 9
75Mb Session 10 75Mb Session 11 75Mb Session 12 75Mb Session 1 75Mb
Session 2 75Mb Session 3 75Mb Session 4 75Mb Session 5 75Mb Session
6 75Mb Session 7 75Mb Session 8 75Mb
Slide 29
10gR2 Improvements _SMM_MAX_SIZE now the driver More advanced
algorithm _PGA_MAX_SIZE = 2 * _SMM_MAX_SIZE Parallel operations
_SMM_PX_MAX_SIZE = 50% * PGA_AGGREGATE_TARGET When DOP 5
_smm_px_max_size / DOP is used
PGA_AGGREGATE_TARGET_SMM_MAX_SIZE