Data Warehousing and Business Intelligence

20 May 2014

Banding and Grouping in Data Warehousing

Filed under: Analysis Services,Data Warehousing — Vincent Rainardi @ 6:53 pm

Reading Chris, Marco, and Alberto’s latest SSAS book (link), I came across banding and grouping. I thought I’d share my own experience about banding and grouping. In this article I would limit it to what they are and how to implement it.

What are they?

Grouping is: classifying text data into groups. For example: when we have 10,000 customers and we want to put them into 10 groups, that’s grouping. Or when we have 300 different occupations, and we want to group them into 15 groups, that’s grouping.

Banding is: classifying numeric data into buckets. For example: when we have have an attribute called Customer Age, and we want to put them into 4 bands (17-25, 26-40, 41-50, 50+), that is called banding. When we have a measure called Purchase Value, and we want to put them into 5 bands (<1k, 1-3k, 3k-5k, 5-10k, 10k+), that’s called banding.

A band can have a name, for example: age 17-25 may be called Young Age, and 41-50 is called Middle Age. A group always have a name. If a band has no name, it is usually named by their values, i.e. “5-10k”.

Banding can be in a dimension or fact table, i.e. the numbers we are putting into buckets can be located in a dimension table or a fact table. But grouping is always in a dimension, never in a fact table.

Grouping can be multi-level, and we can make a hierarchy. For example, for Occupation we can have 2 levels: Level 1 consists of 15 groups and Level 2 consists of 60 groups. For example, we can have a Level 1 Occupation Group called Doctor and Nurses, which consists of 4 Level 2 Occupation Groups: GP, Surgeon, Nurse and Specialist.

An attribute can be grouped in several ways. For example: customers can be grouped by value and by time. So we can create an attribute in the customer dimension called “ABC Group” to group customers into Group A (), Group B

How to implement it?

There are two ways to implement Grouping and Banding: fixed approach and flexible approach.

  1. Fixed approach: create a new column in the dimension table containing the group name.
  2. Flexible approach: create an intermediate fact table. We then use many-to-many relationship in SSAS to connect this new fact table to the main fact table.

The Fixed Approach is used when there is only 1 grouping/banding scheme in the whole company. For example, to band the customer age, all users agree that it should be 17-25, 26-40, 41-50, 50+. Fixed Approach can also be used if there are 2 schemes. Maximum of 3, beyond 3 schemes it becomes impractical.

The Flexible Approach is used when there are many schemes in the company. For example, when we try to get an agreement among the business users to group the occupations, we could not get a concensus between different sales teams. The Flexible Approach enables each user, or each team to create their own scheme.

Fixed Approach

Let’s take Occupation in DimCustomer as an example. We create a new column called OccupationGroup. We modify the ETL to populate this column by reading a staging table which maps each occupation to an occupation group.

For banding, the mapping table in Staging specifies the range, i.e. 17-25, 26-40, 41-50, 50+, and the ETL code looks up this table to determine the correct age band for each row in DimCustomer.

For banding on a measure, we need to create a “transaction dimension” which has a column called Band. For example, for the Purchase Value banding, we create a transaction dimension, one of each purchase, which has a column called Purchase Value Band. The reason for creating this Transaction Dimension is: we should not put a textual attribute in the fact table.

Flexible Approach

For the Occupation example, we create an intermediate fact table with 3 columns: Occupation Scheme, Occupation Group and Customer Key. These 3 columns are:

  • Customer Key is the surrogate key to the customer dimension.
  • Occupation Group contains the name of the group, such as GP, Surgeon, Nurse and Specialist.
  • Occupation Scheme contains the name of the banding scheme, for example: the Financial marketing team would like a banding with 15 occupation groups (scheme 1), the European sales team would like a scheme with 20 occupation groups (scheme 2), and the UK sales team would like a scheme with 25 occupation groups (scheme 3).

The many-to-many feature in SSAS enables the cube users to see only their schemes, because this is limited in the role settings. M2M enables us to “connect” a fact table to a dimension via another fact table. In this case, we are using M2M to connect the main fact table to the customer dimension (Occupation is an attribute of the customer dimension) via the above fact table.

How? On the “dimension usage” tab, the main fact table is connected to the customer dimension on the customer key attribute. The above intermediate fact table is also connected to the customer dimension on the customer key attribute.

Still on the “dimension usage” tab the intermediate fact table is connected to a new dimension called “Occupation Group”. This Occupation Group dimension has a hierarchy containing 2 levels: Occupation Scheme and Occupation Group. When the users put this Occupation hierarchy on the Row Label area of the Excel Pivot Table, they will see Occupation Scheme on column A, and Occupation Group on column B.

They can then put the Occupation attribute from the Customer Dimension on the Row Label area, underneath the Occupation hierarchy. When they do that, they will see the Occupations on column C. They can then put any measures in the “Value” area of the Pivot Table as usual, effectively enable them to slice the measure using the occupation groups.

In the roles, we specify “who can see what”, i.e. the financial marketing team can access scheme 1, the European sales team can access scheme 2, and the UK sales team can access scheme 3. We do this using dimension security. This way, members of the financial marketing team only see the Occupation Groups in Scheme 1, and members of the European sales team only see the Occupation Groups in Scheme 2.

Advantages and Disadvantages

The fixed approach is simple and suitable if we only have 1 or 2 grouping scheme. It does require us to modify the structure of the warehouse and cube though, to add a grouping scheme.

The flexible approach takes longer to build, requires the setting up of ProActive caching in SSAS (which could be tricky), but we don’t need to make any code change to add a grouping scheme. It is suitable for a large group of BI users which every team insists on their own grouping.

 

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: