Tuesday, August 10, 2010

Optimizing Time-Based Calculations in SSAS

SQL Server Analysis Services (SSAS) provides built-in MDX functions to facilitate time-based calculations. For example, these functions are often used for period-to-period and same-period-last-year comparisons. (If you're unfamiliar with these MDX functions and calculations, see the web-exclusive article "The Concepts Behind Time Calculations in SSAS". Although approaches to leverage the MDX functions exist, they have shortcomings. I developed an approach that overcomes the shortcomings and strives to provide end users with calculations that are flexible, simple to use, and offer good performance.
I'll begin by describing commonly used approaches to time-based calculations (which involve a time utility dimension) and the shortcomings of those approaches. I'll then walk you through the approach I developed. Although I use SQL Server 2008 in the examples, my approach also works with SQL Server 2005.
Commonly Used Approaches That Involve Time Utility Dimensions
As I discuss in my web-exclusive article "The Concepts Behind Time Calculations in SSAS," adding distinct time-based calculations for each measure (and applicable time-based dimension attribute/hierarchy) can result in bloated, unwieldy cubes. A time utility dimension can be used instead.
In 2002, I came across the concept of a time analysis utility dimension (now typically referred to as time utility dimension) while reading the first edition of George Spofford's MDX Solutions (John Wiley & Sons, 2001). In Spofford's approach, you manually add a new single-member dimension (with a static value such as [Current]), then add a calculated member to the dimension for each type of desired analysis (e.g., period-to-period, same-period-last-year, year-to-date). The result is a flexible cube that has a small set of calculated member definitions.
For example, look at the pivot table in Figure 1.


Figure 1: Pivot table from a cube in which a time utility dimension was manually added (click to enlarge)


This pivot table is tracking two economic indicators: unemployment and consumer price index (CPI). A Time dimension hierarchy is displayed along the rows of the pivot table, and a Time utility dimension is displayed along the columns. The two columns underneath the Current member display the actual measure values for a given date, while the PriorPeriod and YearAgo columns represent calculated members.
The clever part of this approach is that the Current, PriorPeriod, and YearAgo members are generic (i.e., not explicitly defined against any particular measure) and defined only once. Measure values can be switched in and out of this pivot table. When you do so, the Current, PriorPeriod, and YearAgo values will update accordingly.
Let's walk through an implementation of a time utility dimension built directly into SSAS: the Business Intelligence Wizard's "time intelligence" option. (The Business Intelligence Wizard was first introduced in SSAS 2005.)
If you'd like to follow along, you can download the files in Chessman_TimeAnalysis_CodeSamples.zip by clicking the 103355.zip hotlink near the top of the page. The underlying data, which comes from the Organisation for Economic Co-operation and Development, compares economic trends across different countries.
Start by restoring the SQL Server 2008 relational database from the DebtAnalysis.bak backup file. Next, open the OLAP_GovtData.dwproj project in Business Intelligence Development Studio (BIDS). Edit the project's target server property to match the location of your SSAS instance. Finally, after editing the DebtAnalysis.ds data source to reflect the location of the DebtAnalysis relational database, process the entire database using the Database menu's Process option.
With the OLAP_GovtData project open in BIDS, make a copy of the GovtDebtAnalysis cube. (Alternatively, you can see the output in the GovtDebtAnalysis_BIW.cube file in Chessman_TimeAnalysis_CodeSamples.zip.) Open the copied GovtDebtAnalysis cube in the BIDS designer. On the Calculations tab, select the Add Business Intelligence option from the Cube menu to start the Business Intelligence Wizard. After navigating past the Welcome window, select Define time intelligence in the Choose Enhancement window, as Figure 2 shows.


Figure 2: Selecting the Define time intelligence option (click to enlarge)


Next, choose the [H-Year] hierarchy and select the following check boxes in the Available time calculations section:

  • Year Over Year Growth
  • Quarter Over Quarter Growth
  • Month Over Month Growth
Finally, in the Define Scope of Calculations window, select the applicable measures to be included in the calculations. In my example, I selected all the measures in the wizard. (Note that in the cube's calculations script, I manually removed the measures for the Fact OECDGDP measure group from the quarterly and monthly growth calculations because these measures are defined at an annual grain—i.e., detail at the annual level.) Click Next to review the changes the wizard will implement, then click Finish to implement them. The major changes implemented by the Business Intelligence Wizard include the following:

  • A new named calculation (H-Year DimDate Calculations 1) is added to the dsv_DebtAnalysis data source view with a static value of [Current DimDate]. Likewise, an attribute (also named H-Year DimDate Calculations) is added to the DimDate dimension.
  • A calculated member is created for each time calculation you selected in the wizard. Each member is initially defined with a value of NA.
  • Each calculated member is associated to specific measures and cube cells (specifically, along the attributes in the DimDate dimension) through assignments.
Figure 3 shows a pivot table from the resulting cube. The time utility dimension and unemployment rates are displayed in the columns. The H-Year hierarchy is displayed in the rows.


Figure 3: Pivot table from a cube in which a time utility dimension was added using BIDS (click to enlarge)


The Shortcomings
In versions earlier than SSAS 2005, time dimensions were relatively easy to work with. SSAS 2005 and its introduction of the Unified Dimensional Model (UDM) changed everything. Today, dimension attributes, multiple dimension hierarchies, and role-playing dimensions make it difficult to implement flexible and comprehensive time-based calculations.
Most of the time-based calculation examples I've encountered ignore these issues. Examples in which time utility dimensions are manually added are typically limited to a single dimension hierarchy. The built-in Business Intelligence Wizard doesn't effectively address these challenges either because it's limited to building calculations for either a single-attribute or single-dimension hierarchy. Furthermore, the Business Intelligence Wizard has relatively weak error-handling capabilities to account for missing data along a time dimension.
A New Approach
My approach to time-based calculations involves manually creating a time utility dimension. It's based on concepts from David Shroyer's white paper "A Different Approach to Implementing Time Calculations in SSAS", ideas from Mosha Pasumansky's blog "Time Calculations in UDM: Parallel Period", and the results of my own research. At a high level, I used Shroyer's framework for creating a time utility dimension. I integrated performance optimizations from Pasumansky's blog and used an explicit attribute-based method of scoping the calculations across the time dimension (and any role playing dimensions based on the time dimension).
To walk through this new approach with me, make a copy of the GovtDebtAnalysis cube in the OLAP_GovtData project. (Alternatively, the GovtDebtAnalysis_TUD.cube file in Chessman_TimeAnalysis_CodeSamples.zip contains a completed implementation.) Then follow these steps:

  1. Add a new dimension attribute to the existing time dimension.
  2. Create a new dimension.
  3. Add the new dimension to the cube.
  4. Add the time calculations to the cube.
  5. Test the cube.
Step 1. Add a New Dimension Attribute
The first task in adding a new dimension attribute to the existing time dimension is to add a new named calculation to the DimDate table in the dsv_DebtAnalysis data source view. Name it TimeCalcs and assign it the static expression value of N'Current'. Next, add TimeCalcs as an attribute to the DimDate dimension, setting the AttributeHierarchyVisible property to False. This attribute isn't used in the DimDate dimension, but it's needed for configuring a referenced dimension relationship in Step 3.
Step 2. Create a New Dimension
Now you need to create a new shared dimension. Right-click Dimensions in Solution Explorer, and select the New Dimension menu item to start the Dimension Wizard. As you work through the Dimension Wizard, choose the Use an existing table option, select DimDate as the main table, and select the named calculation (i.e., TimeCalcs) as both the Key column and Name column. Name your dimension DimTimeCalcs.
The DimTimeCalcs dimension will be used for calculations, not aggregations. Therefore, we don't want an All level in the dimension. With your DimTimeCalcs dimension open in the editor, select the TimeCalcs attribute. Change the IsAggregatable property to False and change the DefaultMember to [Current], as Figure 4 shows. Save the changes and close the dimension designer.


Figure 4: Changing the IsAggregatable and DefaultMember properties (click to enlarge)


Step 3. Add the New Dimension to the Cube
The next step is to add the new DimTimeCalcs dimension to the cube. In Solution Explorer, double-click the cube to open it. Right-click anywhere on the Dimension Usage tab and select Add Cube Dimension. For each measure group, set the relationship type to Referenced, as seen in Figure 5.


Figure 5: Configuring the relationship type of each measure group (click to enlarge)


After you're done configuring the measure groups' relationship type, highlight the new dimension in the Dimensions Usage tab and set the HierarchyUniqueNameStyle property to ExcludeDimensionName. Because this dimension doesn't contain user-defined hierarchies, you can simplify any MDX reference to include just the attribute name.
Step 4. Add the Time Calculations to the Cube
You're now ready to add the time calculations to the cube. Select the Calculations tab and switch to the Script view. First, you need to add and define the new calculations. For example, the following code adds the PriorPeriod and YearAgo calculations to the [TimeCalcs] dimension attribute of the DimTimeCalcs dimension, and defines their default value as Null:

CREATE MEMBER CURRENTCUBE.
  [TimeCalcs].[PriorPeriod] AS Null;
CREATE MEMBER CURRENTCUBE.
  [TimeCalcs].[YearAgo] AS Null;
(Although these lines wrap here, you'd enter each CREATE MEMBER statement on one line.)
Next, you need to limit, or scope, the calculations to each desired dimension attribute in the DimDate dimension, as Listing 1 shows.


Listing 1: Code That Scopes the PriorPeriod Calculation


Notice that this code defines only a PriorPeriod calculation and not a YearAgo calculation for the Century dimension attribute. Including a YearAgo calculation doesn't make sense given that this dimension attribute's context is century. However, for many of the other attributes, it does make sense, so you need to define both the PriorPeriod and YearAgo calculations, as shown in Listing 2.


Listing 2: Code That Scopes the PriorPeriod and YearAgo Calculations


To scope the YearAgo calculation for the Calendar Qtr dimension attribute, you can simply use MDX's Lag function with an index of 4, as callout A in Listing 2 shows. I prefer using the Lag function rather than MDX's ParallelPeriod function because ParallelPeriod requires a hierarchy level as an input.
In case you're unfamiliar with the Lag function, it returns the member that's the specified number of positions before the specified member. In this case, the specified member is the current quarter ([DimDate].[Calendar Qtr].CurrentMember) and the index is 4, so the lag function returns the quarter that was four quarters ago (i.e., a year ago). Alternatively, you could use the Lag function with the Month dimension attribute in code such as

[DimDate].[Month].CurrentMember.Lag(12)
to return a year-ago value. However, you shouldn't use the Lag function with the Date dimension attribute to obtain a year-ago value. Code such as

[DimDate].3/17/2010 11:42:31 AM.CurrentMember.Lag(365)
typically doesn't work due to leap years. In this instance, you'd need to use the ParallelPeriod function. None of the measure groups in my cube have a daily grain, so I didn't use the ParallelPeriod function.
You can find all the code needed for step 4 in the MDX script in the GovtDebtAnalysis.cube file in Chessman_TimeAnalysis_CodeSamples.zip. After you've finished scoping the time calculations for each desired attribute, you need to process the cube.
Note: There's a shortcut you can take to scope all the attributes participating in a dimension hierarchy. I included an example in the comments of the MDX script in the GovtDebtAnalysis_TUD cube, and you can see the same approach in the MDX code generated by the Business Intelligence Wizard. I decided to define calculations one attribute at a time because, for large cubes, it can be difficult to keep track of which attributes do and do not participate in a hierarchy.
Also, for some types of time-based calculations (e.g., PeriodsToDate), you need to explicitly define and scope multiple calculations for an attribute participating in multiple hierarchies. For example, a PriorPeriod or YearAgo calculation should be the same regardless of hierarchy, but a year-to-date calculation will be different when querying a Calendar versus Fiscal hierarchy.
Step 5. Test the Cube
Before you let end users query the cube, you should test the cube by using the browser in BIDS to simulate an end user or by creating a connection to the cube from an application such as Microsoft Excel. If you use BIDS, you might encounter an "unknown error" on certain navigation paths because of a bug in the BIDS browser.
These errors don't show up in Excel. If you're using Excel, make sure you go into the PivotTable options and select Show calculated members from OLAP server on the Display tab.
Figure 6 contains a sample pivot table that shows historical unemployment rates in the United States.


Figure 6: Pivot table from a cube in which a time utility dimension was added using the new approach


The TimeCalcs dimension is displayed in the columns. The PriorPeriod and YearAgo calculations are properly scoped at yearly, quarterly, and monthly levels. If you have Excel 2007, you can open the pivot table spreadsheet (Excel_PivotTables.xlsx) and explore the various measures, date attributes, and date hierarchies.
Note that the GovtDebtAnalysis cube you just created doesn't contain role-playing dimensions. If you have time-based role-playing dimensions (e.g., OrderDate, ShipDate) in a cube, you can copy and paste all the scoped calculation definitions, then replace the dimension name with the role-playing dimension name. Because my approach uses a referenced time utility dimension rather than using the [TimeCalcs] attribute directly from the date dimension, the calculations will work across all the role-playing dimensions.
Benefits of the New Approach
So what can your end users gain from all this effort? Consider the original goal of providing end users with flexibility, simplicity, and good performance:

  • Flexibility. Regardless of the measures, date attributes, or date hierarchies being queried, the [Current], [PriorPeriod], and [YearAgo] calculations are automatically computed and appropriately displayed. Other approaches to time utility dimensions don't do this.
  • Simplicity. Only two calculated members were added to provide time-based analysis across the entire cube, avoiding the problem of calculated measure explosion inherit in traditional approaches. (For an example of calculated measure explosion, see "The Concepts Behind Time Calculations in SSAS".)
  • Good performance. The MDX script contains no runtime-dynamic statements (e.g., CASE, IIF). Instead, the script uses static scoping to avoid potential performance issues. It also efficiently eliminates erroneous calculations at the start and tail end of the date dimension. For example, in Figure 6, notice that years without unemployment rates (1790-1989) don't appear in the pivot table and the PriorPeriod and YearAgo results for the first year containing data (1990) correctly show a null value.
Give It a Try
I encourage you to try this new time-calculation approach in your existing cubes and any new cubes you need to create. Your end users will be more productive and independent. Although the implementation involves a bit of up-front effort, it should reduce the amount of overall time spent on custom calculations.

No comments: