Analysis Services Best Practices From Large Deployments

30
Nauzad Kapadia Quartz Systems [email protected]

Transcript of Analysis Services Best Practices From Large Deployments

Page 2: Analysis Services   Best Practices From Large Deployments

Key Takeaways

How to design your cubes efficiently

How to effectively partition your facts

How to optimize cube and query processing

How to ensure that your solution scales well

Page 3: Analysis Services   Best Practices From Large Deployments

Agenda

Cube Design

Storage and Partitioning

Aggregations

Processing

Scalability

Page 4: Analysis Services   Best Practices From Large Deployments

Tips for Designing Dimensions and Facts

Base fact data sources on viewsCan use query hints

Can facilitate write-back partitions for measure groups containing semi-additive measures

Avoid Linked Dimensions

Use the Unknown Member

Page 5: Analysis Services   Best Practices From Large Deployments

Tips for designing Attributes

Avoid unnecessary attributes

Use AttributeHierarchyEnabled property with

care

Use Key Columns appropriately

Page 6: Analysis Services   Best Practices From Large Deployments

Query performanceDimension storage access is faster

Produces more optimal execution plans

Aggregation designEnables aggregation design algorithm to produce effective set of aggregations

Dimension securityDeniedSet = {State.WA} should deny cities and customers in WA—requires attribute relationships

Page 7: Analysis Services   Best Practices From Large Deployments

How attribute relationships affect performance

Page 8: Analysis Services   Best Practices From Large Deployments

After adding attribute relationships…

Don’t forget to remove redundant relationships!

All attributes have implicit relationship to key

Examples:Customer City (not redundant)

Customer State (redundant)

Customer Country (redundant)

Date Month (not redundant)

Date Quarter (redundant)

Date Year (redundant)

Page 9: Analysis Services   Best Practices From Large Deployments

User Defined HierarchiesPre-defined navigation paths through dimensional space defined by attributes

Why create user defined hierarchies?Guide end users to interesting navigation paths

Existing client tools are not “attribute-aware”

PerformanceOptimize navigation path at processing time

Materialization of hierarchy tree on disk

Aggregation designer favors user defined hierarchies

Page 10: Analysis Services   Best Practices From Large Deployments

Natural Hierarchies

1:M relation (via attribute relationships) between every pair of adjacent levels

Examples:Country-State-City-Customer (natural)

Country-City (natural)

State-Customer (natural)

Age-Gender-Customer (unnatural)

Year-Quarter-Month (depends on key columns)How many quarters and months?

4 & 12 across all years (unnatural)

4 & 12 for each year (natural)

Page 11: Analysis Services   Best Practices From Large Deployments

Natural Hierarchies

Performance implicationsOnly natural hierarchies are materialized on disk during processing

Unnatural hierarchies are built on the fly during queries (and cached in memory)

Create natural hierarchies where possibleUsing attribute relationships

Not always appropriate (e.g., Age-Gender)

Page 12: Analysis Services   Best Practices From Large Deployments

Benefits of Partitioning

Partitions can be added, processed, deleted independently

Update to last month’s data does not affect prior months’ partitions

Sliding window scenario easy to implement

e.g., 24 month window add June 2006 partition and delete June 2004

Partitions can have different storage settingsStorage mode (MOLAP, ROLAP, HOLAP)

Aggregation design

Alternate disk drive

Remote server

Page 13: Analysis Services   Best Practices From Large Deployments

Benefits of Partitioning

Partitions can be processed and queried in parallel

Better utilization of server resources

Reduced data warehouse load times

Queries are isolated to relevant partitions less data to scan

SELECT … FROM … WHERE *Time+.*Year+.*2006]

Queries only 2006 partitions

Bottom line partitions enable:Manageability

Performance

Scalability

Page 14: Analysis Services   Best Practices From Large Deployments

Best Practices for Partitioning

No more than 20M rows per partition

Specify partition / data sliceOptional (but still recommended) for MOLAP: server auto-detects the slice and validates against user-specified slice (if any)

Should reflect, as closely as possible, the data in the partition

Must be specified for ROLAP

Remote partitions for scale out

Page 15: Analysis Services   Best Practices From Large Deployments

Best Practices for Designing Partitions

Design from the start

Partition boundary and intervals

Determine what storage model and aggregation level fits best

Frequently queried MOLAP with lots of aggs

Periodically queried MOLAP with less or no aggs

Real-time ROLAP with no aggs

Pick efficient data types in fact table

Page 16: Analysis Services   Best Practices From Large Deployments

What is Proactive Caching

Benefits of Proactive caching

Considerations for using proactive cachingUse correct latency and silence settings

Useful in a transaction-oriented system in which changes are unpredictable

Page 17: Analysis Services   Best Practices From Large Deployments

What are Aggregations

Benefits of Aggregations

Aggregating data in partitions

Page 18: Analysis Services   Best Practices From Large Deployments

Aggregation Design Algorithm

Evaluate cost/benefit of aggregationsRelative to other aggregations

Designed in “waves” from top of pyramid

Cost is related to aggregation size

Benefit is related to “distance”from another aggregation

Storage design wizardAssumes all combinations of attributes are equally likely

Can be done before you knowthe query load

Usage based optimization wizardAssumes query pattern resembles your selection from the query log

Representative history is needed

Fact Table

Page 19: Analysis Services   Best Practices From Large Deployments

Aggregation Design Algorithm

Examines the AggregationUsage property to build list of candidate attributes

Full: Every agg must include the attribute

None: No agg can include the attribute

Unrestricted: No restrictions on the algorithm

Default: Unrestricted if attribute is All, key or belongs to a natural hierarchy, None otherwise

Builds the attribute lattice

Page 20: Analysis Services   Best Practices From Large Deployments

Hit

How to Monitor Aggregation Usage?

Profiler

Page 21: Analysis Services   Best Practices From Large Deployments

Miss

How to Monitor Aggregation Usage?

Profiler

Page 22: Analysis Services   Best Practices From Large Deployments

Best Practices for Aggregations

Define all possible attribute relationships

Set accurate attribute member counts and fact table counts

Set AggregationUsage to guide agg designerSet rarely queried attributes to None

Set commonly queried attributes to Unrestricted

Do not build too many aggregationsIn the 100s, not 1000s!

Do not build aggregations larger than 30% of fact table size (aggregation design algorithm doesn’t)

Page 23: Analysis Services   Best Practices From Large Deployments

Best Practices for Aggregations

Aggregation design cycleUse Storage Design Wizard (~20% perf gain) to design initial set of aggregations

Enable query log and run pilot workload (beta test with limited set of users)

Use Usage Based Optimization (UBO) Wizard to refine aggregations

Use larger perf gain (70-80%)

Reprocess partitions for new aggregations to take effect

Periodically use UBO to refine aggregations

Page 24: Analysis Services   Best Practices From Large Deployments

Processing Options

ProcessFullFully processes the object from scratch

ProcessClearClears all data—brings object to unprocessed state

ProcessDataReads and stores fact data only (no aggs or indexes)

ProcessIndexesBuilds aggs and indexes

ProcessUpdateIncremental update of dimension (preserves fact data)

ProcessAddAdds new rows to dimension or partition

Page 25: Analysis Services   Best Practices From Large Deployments

Best Practices for Processing

Use XMLA scripts in large production systemsAutomation (e.g., using ascmd)

Finer control over parallelism, transactions, memory usage, etc.

Don’t just process the entire database!

Dimension processingPerformance is limited by attribute relationships

Key attribute is a big bottleneck

Define all possible attribute relationships

Eliminate redundant relationships—especially on key!

Bind Dimension data sources to views instead of tables or named queries

Page 26: Analysis Services   Best Practices From Large Deployments

Best Practices for Processing

Partition processingSplit ProcessFull into ProcessData + ProcessIndexes for large partitions—consumes less memory

Monitor aggregation processing spilling to disk (perfmoncounters for temp file usage)

Add memory, turn on /3GB, move to x64/ia64

Fully process partitions periodicallyAchieves better compression over repeated incremental processing

Data sourcesAvoid using .NET data sources—OLEDB is order of magnitude faster for processing

Page 27: Analysis Services   Best Practices From Large Deployments

Improving multi-user performance

Increase Query parallelism

Block long running queries

Use a load balancing cluster

Page 28: Analysis Services   Best Practices From Large Deployments
Page 30: Analysis Services   Best Practices From Large Deployments

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.