Performance Potholes
I remember my first taste of dimension processing with changing dimension data. That was in SQL Server 7.0, in the early days of SSAS, which was called OLAP Server back then. I had created a dimension based on a table whose columns experienced daily updates. When I processed the dimension, I received a “duplicate dimension” processing error. I thought, “What? I don’t have duplicate dimension records in my table!” But the problem wasn’t my dimension table; it was that my dimension in OLAP Server wasn’t marked as a changing dimension. When OLAP Server processed the dimension, all existing rows that had changed looked like new records to the dimension. And the server saw a new row with the same surrogate key as an existing row—a “duplicate.”I set the dimension as a changing dimension, and the error disappeared, but so did some supporting aggregations and, in turn, performance. With changing dimensions in pre-SSAS 2005 versions, whenever you processed the dimension, all cube aggregations would be dropped and then slowly built again—a process called lazy aggregations.
In SSAS 2005, however, the changing-dimension concept is now part of the AttributeRelationship object’s RelationshipType property. This design has two performance impacts. First, when SSAS drops the aggregations and until it rebuilds them, queries suffer because the SSAS engine must calculate the query results by computing the values from either the underlying data or other aggregations. Second, when lazy aggregations are running, generating and storing the aggregations requires server resources, adding to the overall server load.
You can set the RelationshipType property to Flexible or Rigid, depending on whether the data from the associated attribute changes in relationship to the attribute relationship. The default setting is Flexible. If all the RelationshipType settings in an aggregation are set to Rigid, SSAS can persist the aggregation even when it processes the cube. Let’s look at how to determine the correct setting for the RelationshipType property, with the goal of using the Rigid setting as much as possible, which isn’t always easy.
Defining Attribute Relationships
Figure 1 shows the Product dimension in the AdventureWorks cube, with the Model Name and Subcategory attribute relationship highlighted. Also highlighted is the RelationshipType property, currently set to Flexible.Figure 1: Product dimension in the AdventureWorks cube, with the Model Name and Subcategory Attribute Relationship highlighted.
If you follow SSAS optimization best practices (such as the ones in the white paper “Microsoft SQL Server 2005 Analysis Services Performance Guide,” which you can download at http://www.microsoft.com/technet/prodtechnol/sql/2005/ssas2005perfguide.mspx) chances are you’re familiar with two basic optimization techniques for dimensions:
- Create attribute relationships above the key attribute that follow the natural hierarchies of your dimensions.
- Set the RelationshipType property of the attribute relationships to Rigid when the data doesn’t change between related attributes.
So here’s the big question: When should you set the RelationshipType property to Flexible versus Rigid in your dimensions when you have dimension changes occurring and natural hierarchies built? The answer: It depends on what kind of changes are happening and where!
For example, if you’re tracking Type 2 historical changes of an attribute such as Model Name that’s tied to the key attribute, then you change the RelationshipType property to Rigid. But if your Type 2 historical change is happening to an attribute such as SubCategory that’s not directly related to the key attribute but is related instead to a higher level attribute, then the RelationshipType property needs to remain Flexible so that the change doesn’t cause an error.
Thus, the basis for determining whether to set the RelationshipType property to Rigid or Flexible is whether the keys of the related attributes change relative to one another. To determine a change, look at whether the “key” relationships (that is, the attribute key values) change—not whether there are changes to just the name property as attributes related to one another. Confused? Stay tuned.
Type 2 Historical Changes
Let’s look at an example for Type 2 historical changes. The Model Name attribute uses the Model Name column for both the key and the name property. In addition, this attribute is tied directly to the key attribute of the dimension, which is the surrogate key. Let’s assume that Model Name is classified as a Type 2 historical attribute and that Model Name can change for a record. Because a Model Name change is a Type 2 change, a new record will be created when the Model Name changes for a record.In the following sample dimension table, a new Model Name is assigned to an existing product, which creates a new record for that product.
Product Key | Product Name | Product ID | Model Name (the attribute’s key) |
210 | HL Road Frame – Black, 58 | FR-R92B-58 | HL Road Frame |
607 | HL Road Frame – Black, 58 | FR-R92B-58 | HL Frame |
Now let’s look at an example of Subcategory tracked as a Type 2 historical attribute in relationship to Model Name. In Figure 1, Subcategory is defined as an attribute relationship of Model Name. If the Subcategory changes for a product, a new record will be generated for the product, as the following table shows.
Product Key | Product Name | Model Name (the attribute’s key) | Subcategory Key | Subcategory |
210 | HL Road Frame – Black, 58 | HL Road Frame | 14 | Road Frames |
608 | HL Road Frame – Black, 58 | HL Road Frame | 17 | Touring Frames |
Figure 2: Processing errors when the RelationshipType property is set incorrectly
Why the error? Because the Model Name attribute in relationship to the Subcategory attribute changed. The new record had the same Model Name, but the Subcategory ID was different. Although this was a new record, the Model Names were the same for both records, and the Subcategory was related to the Model Name as an attribute relationship, so the different Subcategory ID signified a change. The principle in this case is that you need to set the RelationshipType property to Flexible for attribute relationships that don’t relate directly to the key attribute but that undergo a Type 2 historical change.
Type 1 Changes
Type 1 changing attributes are much easier to understand than Type 2 changes. If you have an attribute whose key column is updated by your ETL process as a Type 1 update, you need to set its RelationshipType property to Flexible for all attribute relationships that it participates in at any level.The only exception to this rule is when you’re updating only the column used as the name property of the attribute and not the key column. In other words, the key of the attribute doesn’t change, only the name changes in relationship to the related attribute. An example of this situation would be if the name of a Subcategory changes but not its key. In the following table, for instance, the Subcategory changes from Road Frames to Road Bike Frames, but the Subcategory Key remains 14 for the new record.
Original Record:
Model Name | Subcategory Key | Subcategory |
HL Road Frame | 14 | Road Frames |
Model Name | Subcategory Key | Subcategory |
HL Road Frame | 14 | Road Bike Frames |
When a Complete Hierarchy Moves
One final note of interest: You can have a lower level attribute that changes and still have a higher level attribute in the same attribute relationship path that’s defined as Rigid, provided the relationships between the higher level relationships don’t change. For example, if Category and Subcategory never change in relationship to each other, even though a Model Name can be assigned to different Subcategories, you can still define the RelationshipType property for Category and Subcategory as Rigid. Consider the following table example, showing original and new records, with the changing record highlighted.Original Records:
Model Name | Subcategory Key | Subcategory | Category Key | Category |
ML Bike Frames | 14 | Road Frames | 3 | Accessories |
HL Road Frame | 14 | Road Frames | 3 | Accessories |
HL Touring Frame | 17 | Touring Frames | 2 | Components |
Model Name | Subcategory Key | Subcategory | Category Key | Category |
ML Bike Frames | 14 | Road Frames | 3 | Accessories |
HL Road Frame | 17 | Touring Frames | 2 | Components |
HL Touring Frame | 17 | Touring Frames | 2 | Components |
Putting It All Together
The result of optimizing the RelationshipType property is that more aggregations will persist even when a dimension is processed, leading to better overall query performance and processing. If you’re lost in the complexities of the changes happening in your dimension data and in how to navigate the RelationshipType setting, you can always try setting all the RelationshipType properties to Rigid, and then change the ones that cause an error to Flexible. But be sure you test all possible changes, or you’ll find yourself dealing with more errors than you bargained for. Better yet, keep this article and its highlighted principles (at right) handy for when you design your dimensions. Either way, you’ll gain the benefits of a cube that’s tuned for performance.Primary Principles for Setting the RelationshipType Property
- The basis for determining whether to set the RelationshipType property to Rigid or Flexible is whether the keys of the related attributes change relative to one another. To determine a change, look at whether the “key” relationships (that is, the attribute key values) change—not whether there are changes to just the name property as attributes related to one another.
- Set the RelationshipType property of a Type 2 historical attribute relationship to Rigid if the attribute relates directly to the key attribute.
- Set the RelationshipType property to Flexible for attribute relationships that don’t relate directly to the key attribute but that undergo a Type 2 historical change.
- If you have an attribute whose key column is updated by your ETL process as a Type 1 update, set its RelationshipType property to Flexible for all attribute relationships that it participates in at any level.
- For Type 1 changing attributes where only the name column changes, set the RelationshipType property to Rigid, as long as the key column for the attribute doesn’t change in relationship to the related attribute.
- Set the RelationshipType property to Rigid for higher level attribute relationships in a natural hierarchy as long as the attributes (their key columns) do not change relative to one another, even if lower levels in the natural hierarchy can change and are set to Flexible.
No comments:
Post a Comment