Analysis Services Best Practices From Large Deployments
-
Upload
rsnarayanan -
Category
Technology
-
view
3.467 -
download
1
Transcript of Analysis Services Best Practices From Large Deployments
Nauzad KapadiaQuartz [email protected]
Key Takeaways
How to design your cubes efficiently
How to effectively partition your facts
How to optimize cube and query processing
How to ensure that your solution scales well
Agenda
Cube Design
Storage and Partitioning
Aggregations
Processing
Scalability
Tips for Designing Dimensions and Facts
Base fact data sources on viewsCan use query hints
Can facilitate write-back partitions for measure groups containing semi-additive measures
Avoid Linked Dimensions
Use the Unknown Member
Tips for designing Attributes
Avoid unnecessary attributes
Use AttributeHierarchyEnabled property with
care
Use Key Columns appropriately
Query performanceDimension storage access is faster
Produces more optimal execution plans
Aggregation designEnables aggregation design algorithm to produce effective set of aggregations
Dimension securityDeniedSet = {State.WA} should deny cities and customers in WA—requires attribute relationships
How attribute relationships affect performance
After adding attribute relationships…
Don’t forget to remove redundant relationships!
All attributes have implicit relationship to key
Examples:Customer City (not redundant)
Customer State (redundant)
Customer Country (redundant)
Date Month (not redundant)
Date Quarter (redundant)
Date Year (redundant)
User Defined HierarchiesPre-defined navigation paths through dimensional space defined by attributes
Why create user defined hierarchies?Guide end users to interesting navigation paths
Existing client tools are not “attribute-aware”
PerformanceOptimize navigation path at processing time
Materialization of hierarchy tree on disk
Aggregation designer favors user defined hierarchies
Natural Hierarchies
1:M relation (via attribute relationships) between every pair of adjacent levels
Examples:Country-State-City-Customer (natural)
Country-City (natural)
State-Customer (natural)
Age-Gender-Customer (unnatural)
Year-Quarter-Month (depends on key columns)How many quarters and months?
4 & 12 across all years (unnatural)
4 & 12 for each year (natural)
Natural Hierarchies
Performance implicationsOnly natural hierarchies are materialized on disk during processing
Unnatural hierarchies are built on the fly during queries (and cached in memory)
Create natural hierarchies where possibleUsing attribute relationships
Not always appropriate (e.g., Age-Gender)
Benefits of Partitioning
Partitions can be added, processed, deleted independently
Update to last month’s data does not affect prior months’ partitions
Sliding window scenario easy to implement
e.g., 24 month window add June 2006 partition and delete June 2004
Partitions can have different storage settingsStorage mode (MOLAP, ROLAP, HOLAP)
Aggregation design
Alternate disk drive
Remote server
Benefits of Partitioning
Partitions can be processed and queried in parallel
Better utilization of server resources
Reduced data warehouse load times
Queries are isolated to relevant partitions less data to scan
SELECT … FROM … WHERE *Time+.*Year+.*2006]
Queries only 2006 partitions
Bottom line partitions enable:Manageability
Performance
Scalability
Best Practices for Partitioning
No more than 20M rows per partition
Specify partition / data sliceOptional (but still recommended) for MOLAP: server auto-detects the slice and validates against user-specified slice (if any)
Should reflect, as closely as possible, the data in the partition
Must be specified for ROLAP
Remote partitions for scale out
Best Practices for Designing Partitions
Design from the start
Partition boundary and intervals
Determine what storage model and aggregation level fits best
Frequently queried MOLAP with lots of aggs
Periodically queried MOLAP with less or no aggs
Real-time ROLAP with no aggs
Pick efficient data types in fact table
What is Proactive Caching
Benefits of Proactive caching
Considerations for using proactive cachingUse correct latency and silence settings
Useful in a transaction-oriented system in which changes are unpredictable
What are Aggregations
Benefits of Aggregations
Aggregating data in partitions
Aggregation Design Algorithm
Evaluate cost/benefit of aggregationsRelative to other aggregations
Designed in “waves” from top of pyramid
Cost is related to aggregation size
Benefit is related to “distance”from another aggregation
Storage design wizardAssumes all combinations of attributes are equally likely
Can be done before you knowthe query load
Usage based optimization wizardAssumes query pattern resembles your selection from the query log
Representative history is needed
Fact Table
Aggregation Design Algorithm
Examines the AggregationUsage property to build list of candidate attributes
Full: Every agg must include the attribute
None: No agg can include the attribute
Unrestricted: No restrictions on the algorithm
Default: Unrestricted if attribute is All, key or belongs to a natural hierarchy, None otherwise
Builds the attribute lattice
Hit
How to Monitor Aggregation Usage?
Profiler
Miss
How to Monitor Aggregation Usage?
Profiler
Best Practices for Aggregations
Define all possible attribute relationships
Set accurate attribute member counts and fact table counts
Set AggregationUsage to guide agg designerSet rarely queried attributes to None
Set commonly queried attributes to Unrestricted
Do not build too many aggregationsIn the 100s, not 1000s!
Do not build aggregations larger than 30% of fact table size (aggregation design algorithm doesn’t)
Best Practices for Aggregations
Aggregation design cycleUse Storage Design Wizard (~20% perf gain) to design initial set of aggregations
Enable query log and run pilot workload (beta test with limited set of users)
Use Usage Based Optimization (UBO) Wizard to refine aggregations
Use larger perf gain (70-80%)
Reprocess partitions for new aggregations to take effect
Periodically use UBO to refine aggregations
Processing Options
ProcessFullFully processes the object from scratch
ProcessClearClears all data—brings object to unprocessed state
ProcessDataReads and stores fact data only (no aggs or indexes)
ProcessIndexesBuilds aggs and indexes
ProcessUpdateIncremental update of dimension (preserves fact data)
ProcessAddAdds new rows to dimension or partition
Best Practices for Processing
Use XMLA scripts in large production systemsAutomation (e.g., using ascmd)
Finer control over parallelism, transactions, memory usage, etc.
Don’t just process the entire database!
Dimension processingPerformance is limited by attribute relationships
Key attribute is a big bottleneck
Define all possible attribute relationships
Eliminate redundant relationships—especially on key!
Bind Dimension data sources to views instead of tables or named queries
Best Practices for Processing
Partition processingSplit ProcessFull into ProcessData + ProcessIndexes for large partitions—consumes less memory
Monitor aggregation processing spilling to disk (perfmoncounters for temp file usage)
Add memory, turn on /3GB, move to x64/ia64
Fully process partitions periodicallyAchieves better compression over repeated incremental processing
Data sourcesAvoid using .NET data sources—OLEDB is order of magnitude faster for processing
Improving multi-user performance
Increase Query parallelism
Block long running queries
Use a load balancing cluster
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.