Thursday, August 19, 2010

SSAS 2005: Cube Performance Tuning Lessons Learned

http://weblogs.sqlteam.com/derekc/archive/2008/10/28/60747.aspx

Intro.
A recent project has forced me (which is a good thing) to learn both the internals of SSAS 2005 as well as various performance tuning techniques to get maximum performance out of the OLAP server. It goes without saying that the grain of both your underlying data warehouse's Dimensions & Facts will drive how large your cubes are (total cube space). It also should be a given that Processing Time & Query (MDX) Execution Time usually compete with one another. Given the same grain of a model, the more Grain Data, Indexing, and Aggreggations you process upfront should generally result in a more performant end-user experience (but not always). And while ETL & Cube Processing time is of importance, in the real-world it is the end-user experience (capability and performance) which drives the DW/BI solution's adoption!
Throw-out unused Attributes/Optimizing Attributes/Leverage Member Properties
The more dimensional attributes you create infers a larger cubespace, which also means more potential aggregations and indexes. Take the time to review with your clients the proposed set of attributes and be sure they all provide value as a 1st class Dimension Attribute. Also, if you find attributes are used primarily for informative purposes only consider leveraging Member Properties instead of creating an entire Dimension Attribute. Also, the surrogate key for your dimensions almost never add business value, delete those attributes and assign the keyColumns of your dimension's grain member (ie it's lowest level) attribute to the surrogate key. If an attribute participates in a natural hierarchy but is not useful as a stand-alone attribute hierarchy you should disable it's hierarchy via the AttributeHierarchyEnabled setting. Finally, be aware that if you have a 'deep' dimension (ie like 19 million members) at its lowest grain, any additional attributes you add will incur much overhead as they have a much higher degree of cardinality.
Set Partition Slices
The question of whether or not you must explicitly set a partition's SLICE property is clearly documented incorrectly in SQL Server 2005 Books Online (BOL). For all but the simplest partition designs you should consider setting the SLICE property to match the source property (ie the dataset definitions should match across both properties). For those who do not know, a partition's SLICE is useful for query execution purposes, the SLICE tells the formula|storage engine which partition(s) hold the data that is being requested of it. Please see resources section below for more information on this.
Optimizing Attribute Relationships
Attribute relationships are the developer's mechanism to inform the OLAP server of the relation between attributes. Just like Fact Tables (measure groups) relate to dimension in various ways (Regular, Referenced, Fact, Many-to-Many), dimension attributes can relate to one another in various forms (One-to-One or One-to-Many).Also, you can set the RelationshipType to Flexible or Rigid. If you know your member's change over time (ie reclass), make sure to leave this setting as Flexible, otherwise set it to Rigid. Take the time to thoroughly review your attribute relationships and ensure that both represent their natural hierarchy and are optimized!
Scalable Measure Group Partitioning & Aggregation Strategies
This is one of the better known techniques but it is still of utmost importance. Make sure to design your measure group's partitions to optimize their performance (both processing and query execution). If your system has a 'rolling window' requirement also account for this in your ETL design/framework. You should almost always partition your measure groups by the DATE_KEY and match the underlying relational data warehouse (RDW) partitioning scheme. The basics of this is your 'hot' (the current period) partition should be optimized for query-execution time via setting a different Aggregation Design as opposed to the 'colder' (older) partitions. Also, if your main storage device (ie SAN usually) cannot hold all of your required data, consider leveraging Remote Partitions to offload the extreme 'cold' partitions to slower storage.
Continuously Tune Aggregations Based On Usage
Most documentation in the community clearly states the order of creating effective aggregations is to first leverage the Aggregation Design Wizard, enable the Query Log, and then re-tune the aggregations using the Usage-Based Tuning Wizard. What is not mentioned (near enough anyway) is to continuously retune your aggregations using a refreshed Query Log using the Usage-Based Tuning Wizard. By doing so you are ensuring your aggregations are periodically revised based up recent, real-world usage of your cubes.
Warming the Cache
Another well known technique...by issuing real-world MDX queries onStartUp of the MSOLAP service your cube's cache will be pre-optimized.
Be Mindful of Many-to-Many Dimensions Performance Implications
While M:M dimensions are a powerful feature of SSAS 2005, that power comes at the cost of query-execution time (performance). There are a few modeling scenarios where you almost have to leverage them but be aware that if you are dealing with large amounts of data this could be a huge performance implication at query-time.
Control of the Client Application (MDX): That is the Question
A lesser discussed matter yet still very important is how much control you have over the MDX issued to your cubes. For example, Excel Pivot Tables and other analytical tools allow the user to explore your cubes with freedom pending the security (no Perspectives are not a security measure). If you can write (or control) the MDX being issued by the end-user then obviously you have more control to ensure that actual MDX is optimized.
Got 64-Bit? Multi-Cores?
For enterprise-class Microsoft DW/BI engagements forget about x86/single-core, period. Analysis Services can chew through (process) more data, in higher-degrees of parallelization with x64 multi-core CPUs. Storage|Formula engine cache rely on memory...long-story short, Analysis Services has been designed to perform at higher levels of scalability when running on multi-core/x64 CPUs. Also, be sure to set Analysis Service's Min/Max Thread settings properly for both Query & Processing.
Conclusion
I am dedicated to life-long learning. I cannot take full credit for my content above as much of this knowledge was the work of others as well as my own testing. The resources section listed below gives credit where it is due accordingly. Take the time to learn and implement highly-performant SSAS 2005 cubes to ensure your project's stakeholder’s first query is a performant one!
Resources
Mosha Pasumansky's Blog (MDX 'Father'): http://sqlblog.com/blogs/mosha/default.aspx
Microsoft SQL Server 2005 Analysis Services (best SSAS 2005 OLAP internals book out!) by SAMS Publishing: http://safari.samspublishing.com/0672327821
SQL Server Analysis Services 2005 Performance Tuning Whitepaper (a great tuning document): download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc
HP Solutions with Microsoft SQL Server: http://h71028.www7.hp.com/enterprise/cache/3887-0-0-0-121.html

Analysis Services Performance - Quick

http://www.olapoffice.com/products/mni-product-ms/mni-product-ms-analysis-services-performance

Analysis Services has always has been very good cube performance and it handles sparsity exceptionally well. However, with writeback there have been some considerations that needed to be taken into account when creating solutions. These are discussed here.
Below is diagram of how the cube writeback functionality has progressed within SQL Server versions 2000 -> 2005 -> 2008. We would encourage all uses to upgrade to SQL 2008 to take advantage of the performances gain as it opens up a raft of applications and how they are designed. For more information please contact OLAP Office to discuss further.
analysis_services_olap_writeback

SQL 2008 Cube Writeback Explanation

  • Values are retrieved directly into the worksheet from the cube.
  • If a change is made to a cube multi-dimensional cell then 
    • the cube is updated with the actual value
    • the incremental change between the old cube net value and new cube net value is reflected in the "writeback" table. This is the audit trail of transactions that tells you who has changed what and when.

SQL 2000/2005 Cube Writeback Explanation

  • Values are retrieved into the worksheet by summing 
    • the retrieved value from the cube and
    • the retrieved incremental changes to the respective multi-dimensional cells.
  • If a change is made to the cube then the incremental change between the old net value and new value is reflected in the "writeback" table. This is the audit trail of transactions that tells you who has changed what and when.
Note: To maintain cube performance where writeback is involved with SQL 2000 and 2005 you need to keep the writeback tables at a minimum to optimize performance. This is not the case for 2008.