Designing, Building, and Maintaining Large Cubes using Lessons Learned

Design, Building & Maintaining large cubes using Lessons Learned

Nicholas Dritsas, Eric Jacobsen, Denny LeeSQL Server Customer Advisory Team

Microsoft Corp.

Customer Advisory Team

• Works on largest, most complex SQL Server projects worldwide• US: NASDAQ, USDA, Verizon, Raymond James…• Europe: London Stock Exchange, Barclay’s Capital• Asia and Pacific: Korea Telecom, Western Digital, Japan Railways East• ISVs: SAP, Siebel, Sharepoint, GE Healtcare

• Drives product requirements back into SQL Server from our customers and ISVs

• Shares best practices with SQL Server community• http://blogs.msdn.com/sqlcat - CAT team blog • http://blogs.msdn.com/mssqlisv - ISV blog • http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/default.mspx• Coming soon: http://www.sqlcat.com – technical notes and case studies

We are wearing the Orange shirts during the conference. Stop, say hello and feel free to ask us any questions.

Agenda

• Design• Dimensions• Cubes• Aggregations

• Build• Scalability – Processing• Scalability – Queries• Scalability options for multi-user queries• Best Practices

• Maintain• Monitor

DBA-415-M – Building and Maintaining Large Cubes Lessons Learned

Designing DimensionsSlowly changing and large dimensions• Slowly changing dimensions Type 2:

• Minimize data updates to avoid cube reprocessing• If you must update, do the ProcessAdd every evening and perform

weekly full processing. NOTE: This is only available in XMLA• Large dimensions:

• Use natural hierarchies• Dimension SQL queries are in the form of "select distinct colA, colB, …

from [DimensionTable]" • Many hierarchies introduce many select distinct statements. Look on

tuning SQL indexes.• See TK Anand’s article,

http://msdn2.microsoft.com/en-us/library/ms345142.aspx, for more details• Maximum Size of dimensions

• Successful implementations with 10 million members.


Designing CubesUsing partitions• Partition by time plus another dimension too, such as geography • For real-time BI, you may want having only the most recent partition

ROLAP with the other partitions in MOLAP. • NOTE: When data changes, all data cache for the measure group is

discarded. So, it may make sense to separate cube or measure groups by “static” and “real-time” analysis.


Designing Aggregations

Here is an optimized method:1) Create your aggregations via the Agg Wizard at 5-10%2) Turn on the query log and set the sampling to 1 to record all

queries. Delete queries from the OlapQueryLog table that have a Duration <100ms as those queries are pretty fast.

3) Run a set of MDX queries that best represent the type of questions you will be typically asked by your users.

4) With the OlapQueryLog table full of data, write a SQL statement to only get the rows of slow queries and/or queries executed often.

5) This SQL statement can now be used in the Aggregation Manager sample (found in Codeplex or SQL Server SP2 Samples)



Demonstration #1

Aggregation Manager

Agenda





Scalability - Processing

• Scale by processing many partitions in parallel• No more than 2,000-4,000 total partitions

• If you need more, ensure you have installed build 3166 or later

• Parallelism is applicable when:• dimensions change and need to reprocess partitions• adding or modifying measure groups• changing aggregation design



Process Data

Process Indexes


Read Data from SQL Server

Lookup Dimension

Keys

Write to *.fact.data files

Look at *.fact.data

files

Build *.map files

Write to *.agg.*.data

files

Scalability - ProcessingUse ProcessData and ProcessIndexes

• ProcessFull is the default method and it executes the ProcessData and ProcessIndexes jobs.

• Processing completes faster and AS uses fewer memory resources when using ProcessData and ProcessIndexes separately.

Process enumeration Processing timeTwo-step process 00:12:23 ProcessData 00:10:18 ProcessIndexes 00:02:05ProcessFull 00:13:34


ProcessFull

ProcessData ProcessIndexes



• ProcessData, rule of thumb• 40-80 K rows/sec, per partition• Best if you have

• Integer keys• Less than 10 measures• No SQL joins

• Example:• One customer with many partitions in our lab saw 400 K

rows/sec sustained, 700 K rows/sec peaks, on an 8-CPU machine

• We have seen customers with hundreds of measures



• Next Slide – Show Me the Numbers• Project Real data• Unisys 16-processor machine• Chose 16 partitions with most similar size• For each data point <n>, set MaxParallel=<n>, used <n>

partitions• Integer keys, 2-part composite keys



• Why performance loss at 16??• On this machine, with this configuration, with this data, Memory

Quota limited to running only 12 in parallel, based on estimates, so last 4 had to wait

• Adding more memory can help• Imagine Gantt chart showing which partitions run at a time• Takeaway point is there can be many things to reduce

scalability, but it is often possible to get good performance


Scalability - Processing• Distinct Count

• Limit one distinct count measure per measure group• Distinct Count cannot use in-memory DataCache to derive storage

engine queries – only goes to fact table partitions• During processing, SQL Server does the sorting• Items to look for:

• Memory grants on SQL Server could limit to only 3 partitions running in parallel, if the query plan generated for these queries exceeds 1/4 the memory on SQLServer.

• Watch perfmon counter for SQLServer: "SQLServer:MemoryManager\Memory Grants Outstanding", "SQLServer:MemoryManager\Memory Grants Pending".

• If you need to process more than 3 partitions in parallel, contact CSS• SQL Query timeout error HYT00.

• Modify <ExternalTimeout>. • A query will be canceled if no rows are returned in that time.



• Dimension processing, potential concerns• Longest pole (Gannt chart analogy) when processing very large

dimension, e.g. 10 million members• Size limitation – 4 GB for “string store”, stored in Unicode, 6 byte

per-string overhead. • E.g. 50-character name: 4*1024*1024*1024 / (6+50*2) = 40.5

million members• Consider size of other properties/attributes

• We saw recent case of discretization of 10 million member customer dimension – workarounds include do it in SQL using NTile or define hierarchy. (Go back to business logic.)

• Usually bigger concern is impact of changing dimensions



• Scale up vs. Scale out• Scale up = big machine, scale out = many machines• Today, officially, only scale up

• Scale out• Better economics• Can be better flexibility of machine usage• AS team is considering delivering a supported method


Scalability - Tools

• Goal• Drive server with workload at <n> users• Measure average throughput (queries/sec)• Measure average response time (sec/query)• Can support <n> users such that average response time < 15

seconds (as example)

• Next slide – example graph• Each point is a 15-minute run


Scalability - Tools

• Some available tools• VBScript - per client, parse mdx text files, execute queries, log

to CSV file, analyze with Excel• VSTS – Visual Studio Team System

• Framework to run multi-user scenarios, record perfmon counters, ramp up users

• There is sample on http://www.codeplex.com, not production quality yet but a good start

• LoadRunner• ASCMD utility, soon to be updated with some additions for Multi

User testing• Roll your own …


Scalability - Tools

• Input to VSTS tools• Number of clients (users) to simulate• Query file for each client (user) – each represents a sequence of

user actions• Time to run – e.g. 15 minutes• Think time – e.g. random between 10 to 20 seconds

• Output• Average throughput – queries/sec• Average response time – sec/query• Perfmon counters• Issue: Is it realistic (representative)• Issue: Any problems? (Could be many.)


What effect do we see?


Another view of same, what is the effect?


TIP: CREATE CACHE or warm-up queries can help after server startup

Multi user query load testing

• Things to watch out for• Duplicate queries (VSTS sample – 18,000 queries gives about

2,000 unique) • ASQueryGenerator and documents with some examples on how to

create template queries to reduce duplicate and empty queries are due for an update soon.

• Empty results (think of cube sparsity) – Real users rarely look at empty regions (e.g. Canada swimsuit sales in December)

• Caching (applies to real world but can skew results good or bad)• Think time• Anything you can think of looking at …


Multiuser scalability

• Load considerations• Process partition while under load• Writeback• Proactive caching (real time)• Force-Commit timeout • Ramp up effects (connection, query warm up time)• End effects (include measurements when last users finish?)• No one-touch tool to examine effects• WAN (wide area network) simulation


Scalability options

• Performance less than expected?• IO Bound?

• Look at disk system, including controllers, number of disks per controller. • Direct-attach, SAN choices. • Use SQLIO to measure hardware. • SSAS Partitions, Aggregations, 64-bit vs. 32-bit

• CPU Bound? • Scale up (bigger machine), scale out (more machines, use replication) • SSAS calculations

• Look at queries.• Think about reasonability. Not a black box, break down to

components, try removing components/factors, isolate• Diagnosing Query performance paper


Scale Up for Queries

• Scale up = more CPUs• Note that 4 socket machines relatively cheaper than > 4

sockets, not a linear cost factor• Benefits scenarios that are CPU bound, and parallel

• Some customers led to believe (wrongly) that bigger machine will improve every query, even formula-engine calculations running by itself

• Presently only way to improve parallel processing


Scale Out for Queries• In clustering you ensure high

availability, with scale out you optimize query performanceoptimize query performance.

• You can setup multiple query-only AS databases on multiple AS servers to handle a larger number of concurrent users.

• NLB or other TCP/IP load balancing is typically used.

• You can find out more info within the “Scale Out Querying with Analysis Scale Out Querying with Analysis ServicesServices” whitepapers.


Scalability

• Looking back at multiuser load testing• Part of loop, understand what is happening, go back to goals,

investigate where time is going, consider if it makes sense• Solutions:

• Avoid time-expensive operations, • Buy more hardware (scale up/scale out)• Limit project goals (e.g. standard reports show less information), revisit

alignment to business goals• Rewrite mdx queries, change design strategy (e.g. one cube for semi-static

analysis, another for 15-minute updates)• Load testing and analysis can take 50% or more of project time. Quick

POC approach might uncover some issues early.


Agenda





What can you do to improve processing performance?• These best practices recommendations are based on the

lessons learned from working with many enterprise AS customers.

• The suggestions below can be found in the Analysis Analysis Services Processing Best PracticesServices Processing Best Practices whitepaper located on the SQL Server Best Practices web site on Technet.


Dimension Processing BP• Add indexes to the underlying tables dimension tables to help

improve the “select distinct” queries generated by AS

• Create a separate table or view for dimension processing so you can optimize specifically for AS dimension processing.

• Set the appropriate values for parallel processing. In general this is 1-2 times CPU. Testing can help to find the optimal value.

• Use the XMLA <Parallel> nodes to group processing tasks. • Use the <Transaction> node to group different objects to have

different transaction commits.


Agenda





Maintenance IssuesBackup and Restore Strategies• The backup and restorebackup and restore functionality has markedly improved within

SQL Server 2005. • Note, SQL Server 2008 will introduce a newer version of this feature to help

improve scaling for huge cubes.

• Backup the SQL database that holds your OlapQueryLog table.OlapQueryLog table.• You can use the Database SynchronizationDatabase Synchronization feature that allows

you to synchronize your database from a primary to a secondary server.

• Similar to AS2000, you can copy the full data foldercopy the full data folder as your backup as well.

• Note, that some of information is encrypted so if you are restoring to a different server, you will need to manually change connection strings and passwords.

• You can find more information about this approach in the best practices whitepaper “Scale Out Querying with Analysis ServicesScale Out Querying with Analysis Services”.


Maintenance IssuesPlanning for possible cube rebuildPlanning for possible cube rebuild• How do you plan for the possible full cube rebuildplan for the possible full cube rebuild if

you have a catastrophic loss of your Analysis Services database?• For starters, partition your data (e.g. by time) if possible so that

way you can restore the available time periods of databases as you are busy re-building any missing portions.

• Presuming you have a great backup solution, then you can restore most of the data except for the current day. The current day data can then be rebuilt against the existing data source.


Agenda





Performance Monitor

• SSAS 2005 has more perfmon tools.• The counter “MSAS 2005: Processing\Rows “MSAS 2005: Processing\Rows

read/sec” read/sec” is helpful to troubleshoot or optimize parallel processing. It provides the number of rows/second AS is reading from the relational data source.

• Processing begins by sending a SQL query to get the data to populate each partition.


How to monitor AS performance

• Use the SQL Server ProfilerSQL Server Profiler to capture key trace events of long running queries (user or processing).

• Use the Windows Event LogWindows Event Log as many AS events are recorded there.

• The AS Query LogAS Query Log stores internal query information meant to help in defining aggregations later.



Demonstration #2

Profiling SQL Executed Statement

Other AS Monitoring Tools

• Use ascmd.exeascmd.exe sample application included in the SQL Server SP2 Samples. You can use the –T option to output trace file when running this utility.

• Create a system-wide trace filesystem-wide trace file to record the events (refer to the attached XMLA file).

• Note, the AS Flight Recorder Flight Recorder exists to record the last set of events that occurred in case of a catastrophic event on the server. Within the AS Properties, check “Show Advanced (All) Properties” and you will notice the Flight Recorder properties


ProcessPartitionAndRunTrace.xmla

SQL CAT Presentations at PASS 2007

Session Code Session Title Speakers Date Time

DBA-410-M Designing for Petabyte using Lessons Learned from Customer Experiences

Lubor Kollar; Lasse Nedergaard

9/19/2007 9:45 AM - 11:00 AM

DBA-411-M Building High Performance SQL system using Lessons Learned from customer deployments

Michael Thomassy; Burzin Patel

9/19/2007 1:30 PM - 2:45 PM

DBA-412-M ISV configuration & implementation using Lessons Learned from customer deployments

Juergen Thomas 9/20/2007 10:30 AM - 11:45 AM

DBA-413-M Building Highly Available SQL Server implementations using Lessons Learned from customer deployments

Prem Mehra; Lindsey Allen; Sanjay Mishra

9/20/2007 1:30 PM - 2:45 PM

DBA-416-M Building and Deploying Large Scale SSRS farms using Lessons Learned from customer deployments

Denny Lee; Lukasz Pawlowski

9/21/2007 9:45 AM - 11:00 AM

DBA-415-M Building & Maintaining large cubes using Lessons Learned from customer deployments

Nicholas Dritsas; Eric Jacobsen

9/21/2007 1:00 PM - 2:15 PM


Thank you!

Thank you for attending this session and the

2007 PASS Community Summit in Denver

Designing, Building, and Maintaining Large Cubes using Lessons Learned

Technology

Transcript of Designing, Building, and Maintaining Large Cubes using Lessons Learned