Data Warehousing and Business Intelligence

20 November 2009

What Are Cubes Bad At?

Filed under: Uncategorized — Vincent Rainardi @ 11:32 pm
Tags:

(OLAP) Cubes are built for analysing data by doing slicing and dicing, drilling up and drilling down. Cubes’ distinct advantage (over reports) is their ability to contain millions of numbers and be delivered to the users’ finger tips, allowing them to browse the data at will, at lightning speed. People who does OLAP* as part of their job sometimes feel that the experience is like Minority Report, where Tom Cruise explored the glass surface containing millions of pictures with those special gloves.

This ‘power at your finger tips’ is sometimes so powerful so that the users forgot that is an OLAP tool, not a reporting tool. But when I said that, nearly always they asked back “What’s the difference?” What’s the difference between delivering data warehouse information through reports and delivering it through cubes? What are reports good at but cubes are absolutely rubbish at it?

There are 3 things that cubes are particularly bad at: (they affect OLAP cubes in general, not only SSAS)

  1. Cosmetics
  2. Columnar reports
  3. Text

1. Cosmetics.
OLAP engines are not good at cosmetics/appearance. There is no formatting. No fancy fonts, bold, italics, logo, images, colour, print layout, screen layout. Cube browser like ProClarity delivers the data bare and plain. Just the numbers. OK I might be overemphasised it. Cube browsers do fonts and charting as well. But it is limited. Where as in reporting world, cosmetics are in abundance. Ask any report developer and they will agree that querying the database and getting the data into the report is only half of the job. A large chunk of the work is about formatting and appearance. Both on the screen and when printed (or exported to PDF).

In SQL Server world, this issue is addressed by:

  1. Exporting the data from the cube (say using ProClarity) into Excel and format it in Excel. So charting, layout are all done in Excel. The function of the cube browser is just to extract the data from the cube.
  2. Browse the cube directly from Excel 2007 or 2010, either using Pivot tables or R2’s add-in of PowerPivot. This way we can format the appearance.
  3. Users order dashboards from IT (or KPI, scorecards, indicators). IT uses tools that read the cubes and display the cube data in the form of dashboards (say using PerformancePoint Dashboard Designer). All the users have to do is to look at the fully-baked dashboards and consume/use the information.
  4. Users order reports from IT. IT uses report design tools like SSRS to “wrap” the cube data in nice layout with pretty format. Like dashboards, the solution is fully baked and all the users need to do is to consume the information.

2. Columnar reports

Columns are the worse enemy of OLAP. Column in its tabular sense, as in a vertical line of data. It is probably clearer if we use picture. This is an example of a 7 columns ‘report’:

Attr1 Attr2 Attr3 Attr4 Attr5 Attr6 Attr7 Measure1
member member member member member member member value
member member member member member member member value
member member member member member member member value
member member member member member member member value

When a user uses Excel or ProClarity to compose a ‘report’ from a cube, he or she may think “I can put as many columns as I like” because “the data is stored in database tables”. When we explained that if column1 is an attribute with 1000 members (we usually need to explain what a member is first), and column2 is an attribute with 1000 members, then a 7 column reports can potentially have a lot of rows: 1000 x 1000 x 1000 x … until 7 times. That is a number with 21 zeros. And that will either make the query slow (like taking 5 minutes to complete), or cause the query to fail, as it hits the 4.2 billion tuples limitation.

In practice, we can try to address the issue of ‘too many columns’ using:

  1. Use the Non Empty buttons in ProClarity (automatically applied if the cube is in SSAS 2008) which will make the report ‘blastening fast’. Here’s a simplified way to explain it to the user: suppose out of the 1000 members of attribute1, only 10 members have values for measure1. Like wise attribute2 (the 2nd column), only 10 members have values for measure1. So the number of rows with values on measure1 is: 10 x 10 x 10 x … until 7 times, which equal to 10 millions. So from the gazillion number with 21 zeros we are now down to 10 million. And that’s why the report is a lot quicker, from 5 minutes down to 2-3 seconds.
  2. Put the attributes from the same dimension next to each other. This way, as Mosha explained, SSAS will run inner join against each dimension separately, and then do full cross join between dimensions”. The query will be ‘faster’ because the attributes from the same dimension ‘hang’ off the dimension key, so SSAS doesn’t need to go deeper than the dimension key.
  3. Other ways include: move some attributes from rows to columns or background, specify the members for particular attribute(s), split the report into 2 pages.
  4. If it is a static report or dashboard, i.e. we compose the MDX query ourselves, we can use NonEmpty on each attribute before cross joining across all attributes, as Darren Gosbell suggested here.

3. Text. Text is another thing which OLAP engines are not so good at.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: