Data Warehousing and Data Science

31 May 2014

5 minutes SSAS Admin for SQL DBAs

Filed under: Analysis Services — Vincent Rainardi @ 9:45 pm

Often, a SQL DBA has to administer SSAS. Here’s a quick 5 mins guide.

  1. How to process a cube?
    In SSMS, right click cube name, process, OK.
  1. When was the cube last processed?
    File, New, DMX query, select  * from $system.mdschema_cubes
  1. Who is using the cube right now?
    File, New, DMX query, select * from $system.discover_connection
  1. How do I give a user access to the cube?
    Put the user into a new AD group. In SSMS, + on CubeName, right click Role, new role, Membership, Add, type the AD group name, click Cubes on the left pane, Access = Read
  1. How big is the cube?
    The size of this folder: C:\Program Files\Microsoft SQL Server\MSAS11.SSAS_MD\OLAP\Data\CubeName.0.db
    Easier but less accurate: in SSMS, right click cube name, properties, Estimated Size.
  1. How do I deploy cube changes to production?
    In Dev server, right click the Cube name, script database as, create, new window. Change the <Database Name? to “x_New”, and the connection string (Search “ConnectionString” to find it). Save it as x.xmla. Change connection to Production, F5 to execute the script. Right click the new cube, Process, OK. After the new cube is processed, delete the old cube and rename the new cube.
    If you have secondary Prod server, use Synchronize: right click the cube on SSMS, Synchronize. This will copy SSAS files from Secondary Server to Primary Server incrementally (only changed files are copied). You need to process the cube first in the Secondary Server.
  1. How long does it take to process the cube?
    Open SQL profiler, File, new trace, specify SSAS server, Event: Progress Report Begin and End, column filter: DatabaseName like xxx.
  1. How do I process a cube every night?
    In SSMS, right click cube name, process, script. Create a new SQL Agent job, new job step, type = AS Command, command = paste the script.
  1. How do I track who’s using the cube every day?
    Setup query log table. See this article: link
  1. Server spec for SSAS?
    Start with 16 GB, quad core, 100 GB disk. That should be ok for cube size 5-10 GB.
  1. How do I limit what each user can see in the cube?
    In SSMS, + on Cube Name, right click Role, new role, Dimension Data, select the dimension, tick the number you want the users to see.
  1. Error when processing the cube: the attribute key was not found.
    In SSDT, in Solution Explorer pane, double click the Dimension, look at the DSV panel. Make sure all the dimension key values in the fact table exist in the dimension table. If your dimension is made up of 2 tables chained together, make sure the referenced value exist in the referenced table.
  1. How do I find out the MDX of the query?
    Install OLAP Pivot Table Extension: link (thanks to Greg Galloway)
  1. How do I browse a cube?
    In SSMS, right click the cube name (not the database name), Browse.
    Better: open Excel, Data menu, From Other Sources, Analysis Services, type the SSAS Server Name, Next, select the cube name, Finish.
  1. How do I query the cube data using SQL?
    select * from openquery(LinkedServer1, ‘select Measure1 on columns from Cube1’).
    LinkedServer1 is setup as Provider: OLE DB for SSAS 11, Product Name: MSOLAP.5, Data source: AS Server Name, Catalog: AS DB Name.
  1. The cube is very slow
    Put a trace on SQL Profiler to capture all MDX that hit the cube. Find the long running MDX. Optimize using cube partitioning, aggregation, look at cell-by-cell calculation, check cell-based security, check server memory, use cache warming, don’t process the cube during office hours.

20 May 2014

Banding and Grouping in Data Warehousing

Filed under: Analysis Services,Data Warehousing — Vincent Rainardi @ 6:53 pm

Reading Chris, Marco, and Alberto’s latest SSAS book (link), I came across banding and grouping. I thought I’d share my own experience about banding and grouping. In this article I would limit it to what they are and how to implement it.

What are they?

Grouping is: classifying text data into groups. For example: when we have 10,000 customers and we want to put them into 10 groups, that’s grouping. Or when we have 300 different occupations, and we want to group them into 15 groups, that’s grouping.

Banding is: classifying numeric data into buckets. For example: when we have have an attribute called Customer Age, and we want to put them into 4 bands (17-25, 26-40, 41-50, 50+), that is called banding. When we have a measure called Purchase Value, and we want to put them into 5 bands (<1k, 1-3k, 3k-5k, 5-10k, 10k+), that’s called banding.

A band can have a name, for example: age 17-25 may be called Young Age, and 41-50 is called Middle Age. A group always have a name. If a band has no name, it is usually named by their values, i.e. “5-10k”.

Banding can be in a dimension or fact table, i.e. the numbers we are putting into buckets can be located in a dimension table or a fact table. But grouping is always in a dimension, never in a fact table.

Grouping can be multi-level, and we can make a hierarchy. For example, for Occupation we can have 2 levels: Level 1 consists of 15 groups and Level 2 consists of 60 groups. For example, we can have a Level 1 Occupation Group called Doctor and Nurses, which consists of 4 Level 2 Occupation Groups: GP, Surgeon, Nurse and Specialist.

An attribute can be grouped in several ways. For example: customers can be grouped by value and by time. So we can create an attribute in the customer dimension called “ABC Group” to group customers into Group A (), Group B

How to implement it?

There are two ways to implement Grouping and Banding: fixed approach and flexible approach.

  1. Fixed approach: create a new column in the dimension table containing the group name.
  2. Flexible approach: create an intermediate fact table. We then use many-to-many relationship in SSAS to connect this new fact table to the main fact table.

The Fixed Approach is used when there is only 1 grouping/banding scheme in the whole company. For example, to band the customer age, all users agree that it should be 17-25, 26-40, 41-50, 50+. Fixed Approach can also be used if there are 2 schemes. Maximum of 3, beyond 3 schemes it becomes impractical.

The Flexible Approach is used when there are many schemes in the company. For example, when we try to get an agreement among the business users to group the occupations, we could not get a concensus between different sales teams. The Flexible Approach enables each user, or each team to create their own scheme.

Fixed Approach

Let’s take Occupation in DimCustomer as an example. We create a new column called OccupationGroup. We modify the ETL to populate this column by reading a staging table which maps each occupation to an occupation group.

For banding, the mapping table in Staging specifies the range, i.e. 17-25, 26-40, 41-50, 50+, and the ETL code looks up this table to determine the correct age band for each row in DimCustomer.

For banding on a measure, we need to create a “transaction dimension” which has a column called Band. For example, for the Purchase Value banding, we create a transaction dimension, one of each purchase, which has a column called Purchase Value Band. The reason for creating this Transaction Dimension is: we should not put a textual attribute in the fact table.

Flexible Approach

For the Occupation example, we create an intermediate fact table with 3 columns: Occupation Scheme, Occupation Group and Customer Key. These 3 columns are:

  • Customer Key is the surrogate key to the customer dimension.
  • Occupation Group contains the name of the group, such as GP, Surgeon, Nurse and Specialist.
  • Occupation Scheme contains the name of the banding scheme, for example: the Financial marketing team would like a banding with 15 occupation groups (scheme 1), the European sales team would like a scheme with 20 occupation groups (scheme 2), and the UK sales team would like a scheme with 25 occupation groups (scheme 3).

The many-to-many feature in SSAS enables the cube users to see only their schemes, because this is limited in the role settings. M2M enables us to “connect” a fact table to a dimension via another fact table. In this case, we are using M2M to connect the main fact table to the customer dimension (Occupation is an attribute of the customer dimension) via the above fact table.

How? On the “dimension usage” tab, the main fact table is connected to the customer dimension on the customer key attribute. The above intermediate fact table is also connected to the customer dimension on the customer key attribute.

Still on the “dimension usage” tab the intermediate fact table is connected to a new dimension called “Occupation Group”. This Occupation Group dimension has a hierarchy containing 2 levels: Occupation Scheme and Occupation Group. When the users put this Occupation hierarchy on the Row Label area of the Excel Pivot Table, they will see Occupation Scheme on column A, and Occupation Group on column B.

They can then put the Occupation attribute from the Customer Dimension on the Row Label area, underneath the Occupation hierarchy. When they do that, they will see the Occupations on column C. They can then put any measures in the “Value” area of the Pivot Table as usual, effectively enable them to slice the measure using the occupation groups.

In the roles, we specify “who can see what”, i.e. the financial marketing team can access scheme 1, the European sales team can access scheme 2, and the UK sales team can access scheme 3. We do this using dimension security. This way, members of the financial marketing team only see the Occupation Groups in Scheme 1, and members of the European sales team only see the Occupation Groups in Scheme 2.

Advantages and Disadvantages

The fixed approach is simple and suitable if we only have 1 or 2 grouping scheme. It does require us to modify the structure of the warehouse and cube though, to add a grouping scheme.

The flexible approach takes longer to build, requires the setting up of ProActive caching in SSAS (which could be tricky), but we don’t need to make any code change to add a grouping scheme. It is suitable for a large group of BI users which every team insists on their own grouping.

 

Blog at WordPress.com.