Data Warehousing and Business Intelligence

30 November 2010

SSAS Developer Interview Questions

Filed under: Analysis Services,Other — Vincent Rainardi @ 9:47 pm
Tags: ,

There are 5 key areas to test:

  1. Cube design: the stages of creating a cube in BIDS, i.e. data source, DSV, dimension, cube.
  2. Performance tuning: troubleshooting query performance as well as processing performance.
  3. MDX: the multi dimensional expression that we all love.
  4. Client tools: creating reports and dashboards on Strategy Companion, Excel, Tableau, Panorama, etc.
  5. Administration: installing and upgrading SSAS servers, configuring SSAS server settings, security, processing, deployment, migration, backing up and restoring the cubes.

Beyond these 5, I think it is very important for a SSAS developer (and ETL developer) to understand the concept of dimensional modelling well. Hence I like throw in one or two Kimball questions. Usually candidates understand dimensional modelling, at least the basic, but we need to make sure.

1. Cube design

Out of the 5 things above, Cube Design is the most important one. It is the bread and butter of the job. We need to establish 2 things here:

a) Do the candidates know the basic cube design (things that your company use now)

b) Do they know the advanced cube design (features that your company don’t use now)

Because it depends on the SSAS features your company use, what constitutes “basic” are different from company to company. But it’s probably not far from this list:

  • Data source: connection string, impersonation
  • DSV: table, view, named query, relationships between entities
  • Dimension: hierarchy, sort order, attribute relationships, OrderByAttribute, display folder, default member
  • Cube: referenced dimension, many-to-many, partitioning, aggregation, measure expression, format string, calculated members, error configuration, display folder
  • Role: membership, cube access

Advanced cube design:

  • DS & DSV: multiple DS, multiple DSVs, logical primary key, friendly name, retrieve relationship, schema restriction, number of connections
  • Dimension: parent child dim, translations, date calculation dimension, multiple KeyColumns, ragged dim, type 2 dim, custom rollup, unary operator, write enabled, ROLAP.
  • Cube: actions (URL & drill through), translations, perspectives, HOLAP, proactive caching, input dimension (dimension which is not connected to any MG, but used in calculation), IsAggregatable, KPI.
  • Role: dimensional security, cell security
  • AMO: doing all of the above programmatically using C#.

I usually like to establish first if the candidate can do the job, i.e. basic stuff above/things that we used in the company. I do this by picking 3 from the basic stuff, see if they are comfortable. If they do, I pick 3 from the advanced stuff. If they are not comfy with the basic, the question I need to answer is whether the candidate would be able to sort it out by themselves on the job. I do that by giving them some hints.

Whenever possible, I prefer to wrap the question in a business context, rather than bare theory. For example, question a) below is with business context, where as question b) is theory:

a) Explain how you handle currency conversion.
They can use measure expression, many-to-many, but we know immediately if they have done this stuff or not.

b) Explain different modes of impersonation.
That is just theory. I prefer to give a context i.e. when would we want to use service account and when to use a specific user name? Or: if the connection string already defines how to connect to the database, then what’s impersonation for?

2. Performance tuning

I think what I wrote here [Optimising Cube Query and Processing Performance] is quite comprehensive to be used as a base for our questions. As per its title, it covers the troubleshooting of both query performance and processing performance.

For query performance we can ask MDX queries, aggregation, partitioning, server settings. For processing performance: dimension, materialized view, partitioning and the most important of them all is incremental processing. I explain more about processing performance here.

Again I like to wrap them in a case, for example: a cube takes 5 hours to process and we need to get it down to 1 hour. How can we do that? Many candidates never optimise dimension processing, but any good SSAS developer will at least understand partitioning. An expert developer may even give you a lecture about why aligning table partitioning and incremental partition processing are very important.

Another favourite question of mine is what to do if a) a query is slow, and b) a certain measure is slow (other measures are fine). For a) trap the MDX, run it in MDX Studio (thanks Mosha) and find out which part makes it slow by altering the query bit by bit. For b) establish first if it’s a calculated measure or base measure. If it’s calculated, look at the MDX and analyse on MDX Studio. If it’s a base measure, look at the storage mode (ROLAP?), the dimension usage tab (relationship type = fact or many to many? Try removing some relationships & reprocess), measure expression (is it a simple A/B formula?), simplify the attributes involved in a query (perhaps the issue is with the attributes, not with the measures?).

3. MDX

This is always tricky. Not about figuring out what to ask, but what level of MDX do you need in your company/project. Any SSAS developer can do basic MDX, right? Not quite. Yes they can do the basic MDX select statement, but they might not know even the basic stuff like filter and sorting. So we need to test those. I’d recommend the following list:

Count, YTD, parallel period, filter, existing, order, first/lastchild, as/descendents, currentmember, if/iif/case, existing, head & tail, crossjoin, nonempty, except, scope, top/bottomcount.

They are considered every day use. Any SSAS developer who always queries their cubes using client tools would be separated from those who write MDX to query the cubes. Obviously we don’t need to ask all of them, just pick 3 or 4.

Before we test those functions, it’s worth establishing if they grasp the concepts of multidimensional database (OLAP) or not. For example: tuple, set, cube space, axis, level, member. If they can’t differentiate a tuple from a set, it’s probably not worth asking them any MDX function.

You might argue that “O but we don’t use MDX here”. But it is essential that they understand the concept and the basic functions. You can’t effectively compose SSRS reports displaying data from your cubes if your SSAS developer can’t write MDX. You can’t effectively build a cube if you don’t understand multidimensional database concepts.

XMLA (ASSL) is a bit different. They are not as important as MDX because you can script them on SSMS. Whether it is to create or to alter SSAS objects, or to process them, you can script them on SSMS. Besides, we have AMO for creating/altering SSAS objects programmatically. With AMO you get proper development environment like looping and branching, so you have more control. And it’s .NET. True that you can use tools like Excel, Panorama & ProClarity to get the MDX, or use Profiler to trap Strategy Companion & Tableau’s MDX statements, but you can’t optimise the calculated members if your SSAS developer doesn’t know MDX.

Another thing to consider is whether you need to cover DAX. This depends on whether you are using (or will be using) PowerPivot or not. But DAX is becoming more and more central in MS OLAP technology so in the long run (like 2 years from now) you will need to cover DAX. But for now it’s still optional, depending on the use of PowerPivot.

4. Client Tools

Specific features are different from one client tool to the next, but the basic ideas are similar: creating report, chart and dashboard. Some of the specific features to test are: create custom set, create calculated member, time calculation function such as YTD and QTD, hyperlink to differ reports, drilldown. One of my favourite questions is whether a specific calculation should be done in the client tool or in SSAS.

The administration of the client tool is usually done by somebody else, not the SSAS developer, so we shouldn’t worry about it in this interview. But if you are a small company then usually one person does it all, in which case you will need to ask the candidate about installation, migration, user security, and other aspects of the client tool.

5. Administration

Administration tasks are not the responsibility of an SSAS developer. They are the responsibility of the SQL DBA. These days, in large corporations, SQL Server DBAs are asked to manage all 4 aspects of SQL Server. They manage not only the relational engine, but also SSAS server, SSIS packages and SSRS servers. But in smaller companies SSAS developers are involved in administering the servers. So I include this section (section 5, Admin) in the interview.

The Questions

Enough with the background, let’s get on with the questions. I’m going to label each question with 1-5 indicating the 5 areas above, and LMH (low, medium, high) indicating the difficulty. I’ll put the label in curly brackets: {}. Hopefully the labels help you compose the right mix of interview questions.

  1. How do you ensure that January, February, March, etc will be in the correct sequence when the user browse the cube? {1-L}
  2. In SSAS, how do you design currency conversion for a) from many transaction currencies to 1 reporting currency, b) from 1 transaction currency to several reporting currencies and c) from many transaction currencies to many reporting currencies? {1-M}
  3. A user reported that a Panorama report that she uses every day is very slow today. Normally it opens in 1 or 2 seconds, but today it’s didn’t open at all (she has waited for 4 minutes). Explain the approach how you are going to handle this case systematically. {4-L}
    Note: you can still ask this question if though you are not using Panorama.
  4. We have a risk analysis cube. This cube is 500 GB in size, containing 20 billion fact rows. Explain how we: a) process this cube efficiently; b) ensure a good query performance. {2-M}
  5. Explain (step by step) how you build a cube from a data mart. {1-L}
  6. Explain how you migrate a cube from a dev AS server to production. {5-L}
    Follow up with advantage/disadvantage of each approach. {5-M}
  7. A cube has 300 partitions and we want to process only the partitions where the fact table row changed. We don’t want to process the entire measure group. How do we do that? {2-M}
  8. We have a cube with 1 measure and 1 dimension. That dimension has only 1 attribute. Write an MDX to list the members of that attribute of which measure1 is bigger than 10. We want to sort it in descending order, i.e. largest amount first. {3-L}
  9. Like above but sort on the attribute, not on the measure. {3-L}
  10. People say that we shouldn’t use cell security because it’s slow. Explain how we should use it then? How do you combine it with dimensional security? {2-H}
  11. What’s the difference between a tuple and a set? What is an axis? {3-L}
  12. You need to grant access to 200 users to access a cube. Each user can only access certain data within the cube. The access is by product and region, i.e. user1-20 can only access product1-50 and region1-3, user21-40 can only access product30-80 and region2-5. How would you do that? {5-M}
  13. Some users need to see the sales figures in USD, other users in GBP. How would you do that? {1-H}
  14. You need to upgrade 4 SSAS servers (dev, UAT, prod, DR) from 2005 to 2008. Explain would you approach this. {5-L}
  15. People say that we can have a real time cube using ROLAP or ProActive Caching. Explain how, and the plus/minus of each approach. {1-H}
  16. Cube processed OK but all measures return no value. How do you troubleshoot this issue systematically? {1-M}
  17. How do you do Slowly Changing Dimension type 2 in SSAS? {1-M}
  18. The data mart only has 1 measure: amount. This measure could be asset, expense, income, etc depending on an attribute called “Account”. How do you implement this in SSAS cube? {1-H}
  19. The value of measure “price” is only valid if the product dimension is at the leaf level and the location dimension is at the “store” level. How do you set it? {2-M}
  20. Attribute relationship: In what case would we want to set: a) the cardinality to “one to one”? b) The relationship type to rigid? What’s the risk (of doing so)? {1-M}
  21. Why do you need to design aggregations? How do you design aggregations (usage-based and non usage-based)? Explain how you test it. {1-H}
  22. People say that perspective is not a security measure. What do they mean by that? {1-M}
  23. Star vs snowflake: what’s the benefit of having a snowflake data mart in SSAS? {1-M}
  24. We have a dimension which is as big as the fact table (in terms of the number of rows). What do you suggest we do with this dimension? And with regards to SSAS, what would you do differently? {1-H}
  25. Excel: how do you change language from English to French when browsing a cube?

Answers:

  1. Order by key and put month number in the key column. If they can’t answer this you should worry.
  2. It’s best if the candidate can explain about the currency conversion BI wizard. But if he can explain the “traditional way” it’s quite good too: using calculated measure and many-to-many relationship.
  3. As with any support call, we need to find out which report. Try to reproduce what the user did in NovaView. Find what the MDX is (Tools menu, Direct MDX, ctrl-alt-V), execute in SSMS. If this is slow then find out which part of the MDX makes it slow by building the query step-by-step. Then consider: a) aggregation, b) block computation, c) convert to base measure (do calculation in the relational/ETL), d) improve the MDX if it’s a calculated measure, e) partitioning. Most of the time, when a user says that usually it is 2 seconds but today it timed out, it is not the MDX or the cube structure. It’s usually either the server or the connection. So check that a) the server is up and running and b) from Panorama you can “open” the cube in that server.
  4. Incremental processing (IP). If you have huge cubes and the candidate has never done IP before, end the interview. If you have huge cubes in your company (100 GB), your questions should be focused on performance (processing & query). The job spec should clearly say: wanted: SSAS performance tuning specialist (not a “normal” AS developer) and prepare to pay 20% premium.
  5. It’s an easy question; every single candidate should be able to answer this question as it is bread and butter. But their answer shows their class. One of the important things I’m looking for here is requirement analysis. An SSAS developer who “just build it” without analysing the requirements is usually less helpful then those who analyse the requirements. The other thing I’m looking for here is performance, e.g. when building the DSV do they ensure that the SQL is efficient? When creating calculated measure, do they ensure that the MDX is efficient? The last thing I’m looking for here is how they do unit testing.
  6. a) backup-restore, b) script (XMLA)-execute-process, c) deployment wizard.
  7. First detect (in relational) which partition changed (you need some kind of flag / timestamp column on the fact table). In the metadata table, update the status of each partition of which the data has changed. Then based on this metadata table, process the changed partitions using either a) AMO or b) “AS execute DDL” task in SSIS.
  8. See below.
  9. See below.
  10. Look at Jason Thomas’ post herehere, and Bryan Smith’s post here.
  11. A tuple is a cut of a cube; a set is a collection of several tuples (could be 1 tuple). A tuple is enclosed with parentheses: (); a set is enclosed with curly brackets: {}. An axis is an edge of a multidimensional result set returned by an MDX query (columns, rows, pages, sections, chapters)
  12. Put users into AD groups and add these groups as the members of the cube roles. For each role, define the dimensional security (dimension data tab) on product and region dims.
  13. Create 2 roles: USD and GBP and add users into these roles via AD groups. In each role, set the default member in the currency dimension using dimensional security (dimension data tab in the role).
  14. Upgrade Dev first (document the process), test the cubes by running the reports (Panorama/Strategy Companion/Excel, etc). Use the documented process to upgrade UAT, and test the UAT cubes. Then upgrade prod and DR at the same time (using the documented process). Or DR first, test then Prod. Before upgrading Dev, you might want to copy a Prod cube to Dev and export some reports into Excel. After upgrading Dev, run the same reports and compare with the results before upgrade.
  15. There are 2 things here: dim and MG. In reality, a real time cube doesn’t mean that all dims and all MGs are real time. But only 1 or 2 MG are real time. Usually all dims are MOLAP, perhaps with the exceptions of 1 or 2 dims. To make an MG real time, we can define it as ROLAP storage mode. For performance reason, best not to define the whole of MG as ROLAP, but only 1 partition (usually the last one, if partition by month/period). Unless it’s a small MG (<100,000 rows). For dims, to make it ROLAP set the StorageMode property. ProActive Caching (PC) provides automatic management of MOLAP storage. The cube incorporates the changes in the relational tables, based on a notification mechanism. You can do PC on partition and dim. If the fact table (or partition of the fact table) is small, and data is changing frequently (like every minute), best to use ROLAP. If the fact table (or partition of the fact table) is large (like 8m rows in partition), and the data is rarely changed, best to use PC.
  16. Do “explore data” on the fact table on DSV. Run the SQL if it’s a named query. If this doesn’t return anything, check the fact table. If this returns rows, check the dimension usage tab. Focus on 1 MG. Remove all relationships (keep the cube dims), reprocess cube and see if you get a measure at the top level. If you do, add the relationship to the dims one by one until you find out which dim causing the issue.
  17. No special settings, just create the dim as normal from the SCD2 table. The only difference that it has 3 extra attributes: IsCurrent, Valid From & To date.
  18. Parent child dim, unary operator column, custom rollup.
  19. Scope statement
  20. a) If for each member of attribute1 there is only 1 member of attribute2, b) if “member mappings” are fixed. Risk: if a member changes its mapping.
  21. Why: to increase query performance. How (usage based): OLAP query log. How (non usage based): based on attributes which are queried/used most. Refer to SSAS2008PerfGuide section 3.4.3.1. Test in Mosha’s MDX Studio, before and after, 2 different cubes, same query.
  22. You can’t prevent access to dim/attribute/measure using perspective. They just can’t see it, but if they know the name of the dim/attribute/measure they can access it.
  23. Flexibility in building a dimension, speed of processing. No difference in query performance.
  24. Dimensional modelling: break into smaller dimensions and/or snow flake the dim. SSAS: limit the number of attributes (don’t just take all attributes but be selective), index on those attributes (relational), limit the number of rows going into the cube by using a where clause on the DSV/view, put attributes with lower grain (smaller number of distinct values) into a referenced dim, linked via the main dim to the MG.
  25. Modify the connection string in the ODC file, see my post here.

Number 8:

select order
( filter
  ( {[dim1].[attribute1].[All].Children},
    [Measures].[Measure1] &amp;gt; 10
  ),
  [Measures].[Internet Sales Amount],
  desc
) on rows,
[Measures].[Measure1] on columns
from [Cube]

Number 9:


select order
( filter
  ( {[dim1].[attribute1].[All].Children},
    [Measures].[Measure1] &amp;gt; 10
  ),
  [dim1].[attribute1].CurrentMember.Name,
  desc
) on rows,
[Measures].[Measure1] on columns
from [Cube];

 

 

About these ads

4 Comments »

  1. thanks…keep posting
    really great help for interviews ..

    Comment by vijaykashyap — 1 December 2010 @ 5:52 am | Reply

  2. Very useful article! I partcularly like the way you have categorized your interview questions and assigned difficulty levels. By the way, I am praying that I never get interviewed by you, tough questions :P

    Comment by Jason Thomas — 2 December 2010 @ 5:51 pm | Reply

  3. [...] my article last week “SSAS Developer Interview Questions”, which generated quite a lot of responses, I thought I’d write similar thing on the data [...]

    Pingback by Data Warehousing Interview Questions « Data Warehousing and Business Intelligence — 11 December 2010 @ 11:01 pm | Reply

  4. [...] — Vincent Rainardi @ 12:16 pm Tags: Data Warehousing Following my article last week “SSAS Developer Interview Questions”, which generated interesting responses, I thought I’d write similar thing on the data [...]

    Pingback by Data Warehousing Interview Questions « Data Warehousing and Business Intelligence — 27 December 2010 @ 12:29 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 188 other followers

%d bloggers like this: