Tuesday, November 16, 2010

Maximizing Attribute Relationship Aggregations in Changing Dimensions

When designing your SQL Server 2005 Analysis Services (SSAS) dimensions and related attributes—appropriately called attribute relationships—you should consider how changes in your dimension data will affect the underlying dimension aggregations and, in turn, processing and query performance. If you aren’t careful in defining attribute relationships, you’ll likely run into dimension processing problems such as processing errors—and seemingly at the most inconvenient time. In this article, I explore the attribute relationship property called RelationshipType, which defines whether SSAS persists aggregations when it processes the dimension. If changes such as inserts or updates are occurring in your dimension tables, read on because you’ll want to optimize this property to maximize how your cube stores aggregations. I also walk through different types of dimension changes and their effect on the RelationshipType property so that you can optimize this setting whenever possible.
Performance Potholes
I remember my first taste of dimension processing with changing dimension data. That was in SQL Server 7.0, in the early days of SSAS, which was called OLAP Server back then. I had created a dimension based on a table whose columns experienced daily updates. When I processed the dimension, I received a “duplicate dimension” processing error. I thought, “What? I don’t have duplicate dimension records in my table!” But the problem wasn’t my dimension table; it was that my dimension in OLAP Server wasn’t marked as a changing dimension. When OLAP Server processed the dimension, all existing rows that had changed looked like new records to the dimension. And the server saw a new row with the same surrogate key as an existing row—a “duplicate.”
I set the dimension as a changing dimension, and the error disappeared, but so did some supporting aggregations and, in turn, performance. With changing dimensions in pre-SSAS 2005 versions, whenever you processed the dimension, all cube aggregations would be dropped and then slowly built again—a process called lazy aggregations.
In SSAS 2005, however, the changing-dimension concept is now part of the AttributeRelationship object’s RelationshipType property. This design has two performance impacts. First, when SSAS drops the aggregations and until it rebuilds them, queries suffer because the SSAS engine must calculate the query results by computing the values from either the underlying data or other aggregations. Second, when lazy aggregations are running, generating and storing the aggregations requires server resources, adding to the overall server load.
You can set the RelationshipType property to Flexible or Rigid, depending on whether the data from the associated attribute changes in relationship to the attribute relationship. The default setting is Flexible. If all the RelationshipType settings in an aggregation are set to Rigid, SSAS can persist the aggregation even when it processes the cube. Let’s look at how to determine the correct setting for the RelationshipType property, with the goal of using the Rigid setting as much as possible, which isn’t always easy.
Defining Attribute Relationships
Figure 1 shows the Product dimension in the AdventureWorks cube, with the Model Name and Subcategory attribute relationship highlighted. Also highlighted is the RelationshipType property, currently set to Flexible.

Figure 1: Product dimension in the AdventureWorks cube, with the Model Name and Subcategory Attribute Relationship highlighted.
If you follow SSAS optimization best practices (such as the ones in the white paper “Microsoft SQL Server 2005 Analysis Services Performance Guide,” which you can download at http://www.microsoft.com/technet/prodtechnol/sql/2005/ssas2005perfguide.mspx) chances are you’re familiar with two basic optimization techniques for dimensions:
  • Create attribute relationships above the key attribute that follow the natural hierarchies of your dimensions.
  • Set the RelationshipType property of the attribute relationships to Rigid when the data doesn’t change between related attributes.
The confusion comes when you combine these two techniques and your extraction, transformation, and loading (ETL) process modifies your dimension table by adding new records, adding Type 2 historical dimension changes (which track the history of changes for an attribute), and performing Type 1 changing updates when some attribute values are overwritten.
So here’s the big question: When should you set the RelationshipType property to Flexible versus Rigid in your dimensions when you have dimension changes occurring and natural hierarchies built? The answer: It depends on what kind of changes are happening and where!
For example, if you’re tracking Type 2 historical changes of an attribute such as Model Name that’s tied to the key attribute, then you change the RelationshipType property to Rigid. But if your Type 2 historical change is happening to an attribute such as SubCategory that’s not directly related to the key attribute but is related instead to a higher level attribute, then the RelationshipType property needs to remain Flexible so that the change doesn’t cause an error.
Thus, the basis for determining whether to set the RelationshipType property to Rigid or Flexible is whether the keys of the related attributes change relative to one another. To determine a change, look at whether the “key” relationships (that is, the attribute key values) change—not whether there are changes to just the name property as attributes related to one another. Confused? Stay tuned.
Type 2 Historical Changes
Let’s look at an example for Type 2 historical changes. The Model Name attribute uses the Model Name column for both the key and the name property. In addition, this attribute is tied directly to the key attribute of the dimension, which is the surrogate key. Let’s assume that Model Name is classified as a Type 2 historical attribute and that Model Name can change for a record. Because a Model Name change is a Type 2 change, a new record will be created when the Model Name changes for a record.
In the following sample dimension table, a new Model Name is assigned to an existing product, which creates a new record for that product.
Product Key Product Name Product ID Model Name (the attribute’s key)
210 HL Road Frame – Black, 58 FR-R92B-58 HL Road Frame
607 HL Road Frame – Black, 58 FR-R92B-58 HL Frame
Notice that a new surrogate key is also created for the new Model Name, and the old surrogate key is still tied to the old Model Name. However, because the Model Name didn’t change in relationship to the surrogate key (the key of the key attribute), the relationship isn’t changing, and you can set the RelationshipType property to Rigid for the Model Name attribute relationship. The principle to follow here is that you can set the RelationshipType property of Type 2 historical attribute relationships to Rigid if the attribute relates directly to the key attribute.
Now let’s look at an example of Subcategory tracked as a Type 2 historical attribute in relationship to Model Name. In Figure 1, Subcategory is defined as an attribute relationship of Model Name. If the Subcategory changes for a product, a new record will be generated for the product, as the following table shows.
Product Key Product Name Model Name
(the attribute’s key)
Subcategory Key Subcategory
210 HL Road Frame – Black, 58 HL Road Frame 14 Road Frames
608 HL Road Frame – Black, 58 HL Road Frame 17 Touring Frames
If you define the RelationshipType property of the Subcategory attribute relationship as Rigid, your processing will fail with the error “Rigid relationships between attributes cannot be changed during incremental processing of a dimension,” as Figure 2 shows.

Figure 2: Processing errors when the RelationshipType property is set incorrectly
Why the error? Because the Model Name attribute in relationship to the Subcategory attribute changed. The new record had the same Model Name, but the Subcategory ID was different. Although this was a new record, the Model Names were the same for both records, and the Subcategory was related to the Model Name as an attribute relationship, so the different Subcategory ID signified a change. The principle in this case is that you need to set the RelationshipType property to Flexible for attribute relationships that don’t relate directly to the key attribute but that undergo a Type 2 historical change.
Type 1 Changes
Type 1 changing attributes are much easier to understand than Type 2 changes. If you have an attribute whose key column is updated by your ETL process as a Type 1 update, you need to set its RelationshipType property to Flexible for all attribute relationships that it participates in at any level.
The only exception to this rule is when you’re updating only the column used as the name property of the attribute and not the key column. In other words, the key of the attribute doesn’t change, only the name changes in relationship to the related attribute. An example of this situation would be if the name of a Subcategory changes but not its key. In the following table, for instance, the Subcategory changes from Road Frames to Road Bike Frames, but the Subcategory Key remains 14 for the new record.
Original Record:
Model Name Subcategory Key Subcategory
HL Road Frame 14 Road Frames
New Record:
Model Name Subcategory Key Subcategory
HL Road Frame 14 Road Bike Frames
As long as the Subcategory Key doesn’t change in relationship to the Model Name, you can define this relationship as Rigid. The principle here is that for Type 1 changing attributes where only the name column changes, you can set the RelationshipType property to Rigid—as long as the key column for the attribute doesn’t change in relationship to the related attribute.
When a Complete Hierarchy Moves
One final note of interest: You can have a lower level attribute that changes and still have a higher level attribute in the same attribute relationship path that’s defined as Rigid, provided the relationships between the higher level relationships don’t change. For example, if Category and Subcategory never change in relationship to each other, even though a Model Name can be assigned to different Subcategories, you can still define the RelationshipType property for Category and Subcategory as Rigid. Consider the following table example, showing original and new records, with the changing record highlighted.
Original Records:
Model Name Subcategory Key Subcategory Category Key Category
ML Bike Frames 14 Road Frames 3 Accessories
HL Road Frame 14 Road Frames 3 Accessories
HL Touring Frame 17 Touring Frames 2 Components
New Records:
Model Name Subcategory Key Subcategory Category Key Category
ML Bike Frames 14 Road Frames 3 Accessories
HL Road Frame 17 Touring Frames 2 Components
HL Touring Frame 17 Touring Frames 2 Components
The Subcategory name for the Model Name HL Road Frame changed from Road Frames to Touring Frames, but the new Subcategory name didn’t change in relationship to the Category of existing Touring Frame records. It might look like the Category changed from Accessories to Components, but Category remained the same for the Touring Frames Subcategory. Therefore, you can set the RelationshipType property to Rigid for the Category attribute relationship related to Subcategory. However, the Subcategory and Model Name relationship changes, so you would need to set the RelationshipType property to Flexible for the Subcategory attribute relationship related to Model Name. In this situation, you can set the RelationshipType property to Rigid for higher level attribute relationships in a natural hierarchy as long as the attributes (their key columns) do not change relative to one another, even if lower levels in the natural hierarchy can change and are set to Flexible.
Putting It All Together
The result of optimizing the RelationshipType property is that more aggregations will persist even when a dimension is processed, leading to better overall query performance and processing. If you’re lost in the complexities of the changes happening in your dimension data and in how to navigate the RelationshipType setting, you can always try setting all the RelationshipType properties to Rigid, and then change the ones that cause an error to Flexible. But be sure you test all possible changes, or you’ll find yourself dealing with more errors than you bargained for. Better yet, keep this article and its highlighted principles (at right) handy for when you design your dimensions. Either way, you’ll gain the benefits of a cube that’s tuned for performance.
Primary Principles for Setting the RelationshipType Property
  • The basis for determining whether to set the RelationshipType property to Rigid or Flexible is whether the keys of the related attributes change relative to one another. To determine a change, look at whether the “key” relationships (that is, the attribute key values) change—not whether there are changes to just the name property as attributes related to one another.
  • Set the RelationshipType property of a Type 2 historical attribute relationship to Rigid if the attribute relates directly to the key attribute.
  • Set the RelationshipType property to Flexible for attribute relationships that don’t relate directly to the key attribute but that undergo a Type 2 historical change.
  • If you have an attribute whose key column is updated by your ETL process as a Type 1 update, set its RelationshipType property to Flexible for all attribute relationships that it participates in at any level.
  • For Type 1 changing attributes where only the name column changes, set the RelationshipType property to Rigid, as long as the key column for the attribute doesn’t change in relationship to the related attribute.
  • Set the RelationshipType property to Rigid for higher level attribute relationships in a natural hierarchy as long as the attributes (their key columns) do not change relative to one another, even if lower levels in the natural hierarchy can change and are set to Flexible.

No comments: