Five Tuning Tips For Your Data Warehouse Jeff Moss.

download Five Tuning Tips For Your Data Warehouse Jeff Moss.

If you can't read please download the document

Transcript of Five Tuning Tips For Your Data Warehouse Jeff Moss.

  • Slide 1
  • Five Tuning Tips For Your Data Warehouse Jeff Moss
  • Slide 2
  • My First Presentation Yes, my very first presentation For BIRT SIG For UKOUG Useful Advice from friends and colleagues Use graphics where appropriate Find a friendly or familiar face in the audience Imagine your audience is naked! but like Oracle, be careful when combining advice!
  • Slide 3
  • Be Careful Combining Advice! Thanks for the opportunity Mark!
  • Slide 4
  • Agenda My background Five tips Partition for success Squeeze your data with data segment compression Make the most of your PGA memory Beware of temporal data affecting the optimizer Find out where your query is at Questions
  • Slide 5
  • My Background Independent Consultant 13 years Oracle experience Blog: http://oramossoracle.blogspot.com/http://oramossoracle.blogspot.com/ Focused on warehousing / VLDB since 1998 First project UK Music Sales Data Mart Produces BBC Radio 1 Top 40 chart and many more 2 billion row sales fact table 1 Tb total database size Currently working with Eon UK (Powergen) 4Tb Production Warehouse, 8Tb total storage Oracle Product Stack
  • Slide 6
  • What Is Partitioning ? Partitioning addresses key issues in supporting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions. Oracle Database Concepts Manual, 10gR2 Introduced in Oracle 8.0 Numerous improvements since Subpartitioning adds another level of decomposition Partitions and Subpartitions are logical containers
  • Slide 7
  • Partition To Tablespace Mapping Partitions map to tablespaces Partition can only be in One tablespace Tablespace can hold many partitions Highest granularity is One tablespace per partition Lowest granularity is One tablespace for all the partitions Tablespace volatility Read / Write Read Only P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / WriteRead Only
  • Slide 8
  • Why Partition ? - Performance Improved query performance Pruning or elimination Partition wise joins Read only partitions Quicker checkpointing Quicker backup Quicker recovery but it depends on mapping of: partition:tablespace:datafile JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC SELECT SUM(sales) FROM part_tab WHERE sales_date BETWEEN 01-JAN-2005 AND 30-JUN-2005 Sales Fact Table * Oracle 10gR2 Data Warehousing Manual
  • Slide 9
  • Why Partition ? - Manageability Archiving Use a rolling window approach ALTER TABLE ADD/SPLIT/DROP PARTITION Easier ETL Processing Build a new dataset in a staging table Add indexes and constraints Collect statistics Then swap the staging table for a partition on the target ALTER TABLEEXCHANGE PARTITION Easier Maintenance Table partition move, e.g. to compress data Local Index partition rebuild
  • Slide 10
  • Why Partition ? - Scalability Partition is generally consistent and predictable Assuming an appropriate partitioning key is used and data has an even distribution across the key Read only approach Scalable backups - read only tablespaces are ignored so partitions in those tablespaces are ignored Pruning allows consistent query performance
  • Slide 11
  • Why Partition ? - Availability Offline data impact minimised depending on granularity Quicker recovery Pruned data not missed EXCHANGE PARTITION Allows offline build Quick swap over P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / WriteRead Only
  • Slide 12
  • Fact Table Partitioning Transaction Date 07-JAN-2005Customer 109-JAN-2005 15-JAN-2005Customer 217-JAN-2005 January Partition February Partition 22-JAN-2005Customer 301-FEB-2005 02-FEB-2005Customer 405-FEB-2005 26-FEB-2005Customer 528-FEB-2005 March Partition 06-MAR-2005Customer 207-MAR-2005 12-MAR-2005Customer 315-MAR-2005 Tran DateCustomerLoad Date April Partition 21-JAN-2005Customer 704-APR-2005 09-APR-2005Customer 910-APR-2005 Load Date Easier ETL Processing Each load deals with only 1 partition No use to end user queries! Cant prune Full scans! Harder ETL Processing But still uses EXCHANGE PARTITION Useful to end user queries Allows full pruning capability 07-JAN-2005Customer 109-JAN-2005 15-JAN-2005Customer 217-JAN-2005 21-JAN-2005Customer 704-APR-2005 22-JAN-2005Customer 301-FEB-2005 January Partition February Partition 02-FEB-2005Customer 405-FEB-2005 26-FEB-2005Customer 528-FEB-2005 March Partition 06-MAR-2005Customer 207-MAR-2005 12-MAR-2005Customer 315-MAR-2005 Tran DateCustomerLoad Date April Partition 09-APR-2005Customer 910-APR-2005
  • Slide 13
  • Watch out for Partition exchange and table statistics 1 Partition stats updated but Global stats are NOT! Affects queries accessing multiple partitions Solution Gather stats on staging table prior to EXCHANGE Gather stats on partitioned table using GLOBAL Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
  • Slide 14
  • Partitioning Feature: Characteristic Reason Matrix Characteristic: Feature: PerformanceManageabilityScalabilityAvailability Read Only Partitions Pruning (Partition Elimination) Partition wise joins Parallel DML Archiving Exchange Partition Partition Truncation Local Indexes
  • Slide 15
  • What Is Data Segment Compression ? Compresses data by eliminating intra block repeated column values Reduces the space required for a segment but only if there are appropriate repeats! Self contained Lossless algorithm
  • Slide 16
  • Where Can Data Segment Compression Be Used ? Can be used with a number of segment types Heap & Nested Tables Range or List Partitions Materialized Views Cant be used with Subpartitions Hash Partitions Indexes but they have row level compression IOT External Tables Tables that are part of a Cluster LOBs
  • Slide 17
  • How Does Segment Compression Work ? Database Block Symbol Table Row Data Area 100Call to discuss bill amountTELNOYES 3TEL 4NO 5YES2Call to discuss bill amount 110012345 101Call to discuss new productMAILNON/A 8MAIL 9N/A 7Call to discuss new product 6101 67849 102Call to discuss new productTELYESN/A 107359 102 ID DESCRIPTION CONTACT TYPE OUTCOME FOLLOWUP
  • Slide 18
  • Pros & Cons Pros Saves space Reduces LIO / PIO Speeds up backup/recovery Improves query response time Transparent To readers and writers Decreases time to perform some DML Deletes should be quicker Bulk inserts may be quicker Cons Increases CPU load Can only be used on Direct Path operations CTAS Serial Inserts using INSERT /*+ APPEND */ Parallel Inserts (PDML) ALTER TABLEMOVE Direct Path SQL*Loader Increases time to perform some DML Bulk inserts may be slower Updates are slower
  • Slide 19
  • Ordering Your Data For Maximum Benefits Colocate data to maximise compression benefits For maximum compression Minimise the total space required by the segment Identify most compressable column(s) For optimal access We know how the data is to be queried Order the data by Access path columns Then the next most compressable column(s) 12345 1234512345 12345 11112 2223333444 45555 Uniformly distributed Colocated
  • Slide 20
  • Get Max Compression Order Package PROCEDURE mgmt_p_get_max_compress_order Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- P_TABLE_OWNER VARCHAR2 IN DEFAULT P_TABLE_NAME VARCHAR2 IN P_PARTITION_NAME VARCHAR2 IN DEFAULT P_SAMPLE_SIZE NUMBER IN DEFAULT P_PREFIX_COLUMN1 VARCHAR2 IN DEFAULT P_PREFIX_COLUMN2 VARCHAR2 IN DEFAULT P_PREFIX_COLUMN3 VARCHAR2 IN DEFAULT BEGIN mgmt_p_get_max_compress_order(p_table_owner => AE_MGMT,p_table_name =>BIG_TABLE,p_sample_size =>10000); END: / Running mgmt_p_get_max_compress_order... ---------------------------------------------------------------------------------------------------- Table : BIG_TABLE Sample Size : 10000 Unique Run ID: 25012006232119 ORDER BY Prefix: ---------------------------------------------------------------------------------------------------- Creating MASTER Table : TEMP_MASTER_25012006232119 Creating COLUMN Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table 3: COL3 ---------------------------------------------------------------------------------------------------- The output below lists each column in the table and the number of blocks/rows and space used when the table data is ordered by only that column, or in the case where a prefix has been specified, where the table data is ordered by the prefix and then that column. From this one can determine if there is a specific ORDER BY which can be applied to to the data in order to maximise compression within the table whilst, in the case of a a prefix being present, ordering data as efficiently as possible for the most common access path(s). ---------------------------------------------------------------------------------------------------- NAME COLUMN BLOCKS ROWS SPACE_GB ============================== ============================== ============ ============ ======== TEMP_COL_001_25012006232119 COL1 290 10000.0022 TEMP_COL_002_25012006232119 COL2 345 10000.0026 TEMP_COL_003_25012006232119 COL3 555 10000.0042
  • Slide 21
  • Data Warehousing Specifics Star Schema compresses better than Normalized More redundant data Focus on Fact Tables and Summaries in Star Schema Transaction tables in Normalized Schema Performance Impact 1 Space Savings Star schema: 67% Normalized: 24% Query Elapsed Times Star schema: 16.5% Normalized: 10% 1 - Table Compression in Oracle 9iR2: A Performance Analysis
  • Slide 22
  • Things To Watch Out For DROP COLUMN is awkward ORA-39726: Unsupported add/drop column operation on compressed tables Uncompress the table and try again - still gives ORA-39726! After UPDATEs data is uncompressed Performance impact Row migration Use appropriate physical design settings PCTFREE 0 - pack each block Large blocksize - reduce overhead / increase repeats per block
  • Slide 23
  • PGA Memory: What For ? Sorts Standard sorts [SORT] Buffer [BUFFER] Group By [GROUP BY (SORT)] Connect By [CONNECT-BY (SORT)] Rollup [ROLLUP (SORT)] Window [WINDOW (SORT)] Hash Joins [HASH-JOIN] Indexes Maintenance [IDX MAINTENANCE SOR] Bitmap Merge [BITMAP MERGE] Bitmap Create [BITMAP CREATE] Write Buffers [LOAD WRITE BUFFERS] Serial Process PGA Dedicated Server Cursors Variables Sort Area [] V$SQL_WORKAREA.OPERATION_TYPE
  • Slide 24
  • PGA Memory Management: Manual The old way of doing things Still available though even in 10g R2 Configuring ALTER SESSION SET WORKAREA_SIZE_POLICY=MANUAL; Initialisation parameter: WORKAREA_SIZE_POLICY=MANUAL Set memory parameters yourself HASH_AREA_SIZE SORT_AREA_SIZE SORT_AREA_RETAINED_SIZE BITMAP_MERGE_AREA_SIZE CREATE_BITMAP_AREA_SIZE Optimal values depend on the type of work 1 One size does not fit all! 1 - Richmond Shee: If Your Memory Serves You Right
  • Slide 25
  • PGA Memory Management: Automatic The new way from 9i R1 Default OFF in 9i R1/R2 Enabled by setting at session/instance level: WORKAREA_SIZE_POLICY=AUTO PGA_AGGREGATE_TARGET > 0 Default ON since 10g R1 Oracle dynamically manages the available memory to suit the workload But of course, its not perfect! Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005 Joe Seneganik - Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005
  • Slide 26
  • Auto PGA Parameters: Pre 10gR2 WORKAREA_SIZE_POLICY Set to AUTO PGA_AGGREGATE_TARGET The target for summed PGA across all processes Can be exceeded if too small Over Allocation _PGA_MAX_SIZE Target maximum PGA size for a single process Default is a fixed value of 200Mb Hidden / Undocumented Parameter Usual caveats apply
  • Slide 27
  • Auto PGA Parameters : Pre 10gR2 _SMM_MAX_SIZE Limit for a single workarea operation for one process Derived Default LEAST(5% of PGA_AGGREGATE_TARGET, 50% of _PGA_MAX_SIZE) Hits limit of 100Mb When PGA_AGGREGATE_TARGET is >= 2000Mb And _PGA_MAX_SIZE is left at default of 200Mb Hidden / Undocumented Parameter Usual caveats apply
  • Slide 28
  • Auto PGA Parameters : Pre 10gR2 _SMM_PX_MAX_SIZE Limit for all the parallel slaves of a single workarea operation Derived Default 30% of PGA_AGGREGATE_TARGET Hidden / Undocumented Parameter Usual caveats apply Parallel slaves still limited _SMM_MAX_SIZE Impacts only when Session 1 100Mb PGA_AGGREGATE_TARGET: 3000Mb _PGA_MAX_SIZE = 200Mb _SMM_MAX_SIZE = 100Mb _SMM_PX_MAX_SIZE = 900Mb Session 2 100Mb Session 3 100Mb Session 4 100Mb Session 5 100Mb Session 6 100Mb Session 7 100Mb Session 8 100Mb Session 9 75Mb Session 10 75Mb Session 11 75Mb Session 12 75Mb Session 1 75Mb Session 2 75Mb Session 3 75Mb Session 4 75Mb Session 5 75Mb Session 6 75Mb Session 7 75Mb Session 8 75Mb
  • Slide 29
  • 10gR2 Improvements _SMM_MAX_SIZE now the driver More advanced algorithm _PGA_MAX_SIZE = 2 * _SMM_MAX_SIZE Parallel operations _SMM_PX_MAX_SIZE = 50% * PGA_AGGREGATE_TARGET When DOP 5 _smm_px_max_size / DOP is used PGA_AGGREGATE_TARGET_SMM_MAX_SIZE