Data Warehousing and Data Science

9 July 2012

Duplicate Attribute Keys in SSAS

Filed under: Analysis Services,Data Warehousing — Vincent Rainardi @ 5:32 pm
Tags: ,

Sometimes when we process the cube we get an error message: duplicate attribute key. And at other times we get: missing attribute key. In this article I will try to address these 2 issues.

A. Duplicate Attribute Key

There are 2 things to check in this case:

  1. Do you use surrogate key instead of business key?
  2. Is there a duplicate business key in the dimension table?

1. Surrogate Key vs Business Key

In the data warehouse, to link the fact tables to the dim tables we must use surrogate keys. We must not use business key to connect fact table to dim table. But time and time again I have seen people not reading Kimball and Caserta’s book thoroughly and try to build a data warehouse quickly. They use the business key to connect the fact and dim. It is easier to understand if I use an example, so I’ll use income fact table as an example.

FactIncome:

DimWork:

The reason we have 2 rows for WorkId 781 is because DimWork is SCD type 2.

The above example shows that they link FactIncome to DimWork using the business key: WorkId. This is a fatal error. Because when we query “What is the IncomeAmount for WorkId 781 on 9/7/12?” the DW will return 2 rows to the cube:

select DW.WorkId, DW.WorkKey, F.IncomeAmount
from FactIncome F
join DimWork DW on DW.WorkId = F.WordId
where F.SnapshotDateKey = 20120709

Output:

781     2        140
781     3        140

And SSAS will sum the two 140 up into 280.

It is vital that in the fact table we use surrogate key to connect to the dimension table, like this:

FactIncome:

So that when we query “What is the IncomeAmount for WorkId 781 on 9/7/12?” the DW will return only 1 row to the cube:

select DW.WorkId, DW.WorkKey, F.IncomeAmount
from FactIncome F
join DimWork DW on DW.WorkId = F.WordId
where F.SnapshotDateKey = 20120709

Output:

781     2        140

2. Is there a duplicate business key in the dimension table?

The second reason of experiencing “duplicate attribute key” in SSAS is because there is a duplicate business key in the dimension table.

This could happen if there are 2 rows in the source table with the same business key, but the business key on the second row has trailing spaces. For example:

Row 1: CustomerId = ‘GA34’
Row 2: CustomerId = ‘GA34  ’

The ETL consider these 2 rows as two different customers and allocate different surrogate key in the customer dimension.

If you use business key as the key in SSAS dimension, SSAS will say that there is a duplicate attribute key in the customer dimension, i.e. there are two rows with CustomerId = ‘GA34’. This is because SSAS trim the attribute key.

Of course you can suppress the issue using custom configuration (in the property of the dimension), but this is hiding the issue and making the front end (say Tableau or Strategy Companion Analyser) returning incorrect data to the users.

This issue will not occur by the way, if you use surrogate key. It will only occur if you use the business key as the key of the dimension in SSAS.

B. Missing Attribute Key

The second error, missing attribute key, means that there is a surrogate key in the fact table which does not exist in the dimension table. We can isolate the fact row that gives the error message by looking at the details of the error message that SSAS gives us when we process the cube. The value of the key is given in the message. We can then query the dim table to check if the dim table has a row with that SK.

A possible reason why the SK exists in the fact table but not in the dim table is because of a mistake on the ETL task that populates the fact table. The ETL task can be SSIS workflow or Informatica session/mapping, but it is the same principle: the part of the workflow or mapping that does the lookup failed. This is either because the Lookup Transform didn’t use the correct Business Key, or because the Lookup Transform didn’t look at the correct column.

There is another possibility if it is a type 2 dim: the Lookup Transform didn’t get the correct version, i.e. part of lookup logic that examines the EffectiveDate and ExpiryDate (or ValidToDate and ValidFromDate) is not working properly.

Those are the various possibilities of why we could have “duplicate attribute key” and “missing attribute key” in SSAS cube.

1 September 2011

Tomislav’s MDX Book

Filed under: Analysis Services,MDX — Vincent Rainardi @ 11:15 pm
Tags: ,

I met Tomislav Piasevoli for the first time in Manchester in 2009 at SQLBits conference. I attended at his Advanced MDX session and said to him afterwards that his MDX skill was one of the highest in the world.

Last month his new MDX book was published, titled MDX with Microsoft SQL Server 2008 R2 Analysis Services: Cookbook. I am amazed by this book because of how it is structured, as well as the quality of the content. It contains 80 problems. Each problem is described in problem-solution-explanation format.

To me this format is ideal. This is how people should learn. It is always easier to learn by example, rather than by theory. I think all IT forums should have this format, i.e. after 2 years of people asking and answering questions, the moderator should package up the knowledge in the problem-solution-explanation format. I’m sure this FAQ section will answer 90% of the questions in that forum.

We are all so lucky that Tomislav has done this for us in MDX. I wish somebody else do the same in AS, IS, RS, SQL and all other areas of DWBI. The “cook book” format is spot on!

The book contains extremely useful stuff. You only need to look at the table of content to see how useful this book is. From the basic, everyday problem to very advanced ones, they are all there. Date calculations, formatting, navigation, DMVs, context-aware, they are all there. This is a book that provides a guiding hand to SSAS developers. I think all SSAS developers need to have this book inside their laptop, iPad or mobile phone*. It will help us solve most of the MDX problems in our work. Essentially, this book contains the code that you need to type into your cubes.

*or on their desks, for those who still use paper version. For this book the electronic version is better, because you can search your problem and quickly find the solution. Very handy when you have to finish the cube tomorrow!

I need to emphasise that we are very lucky Tomislav spent a lot of his time writing this book. Not many people in the world have the required level of skill to do it. As I said in the beginning, Tomislav’s MDX skill is one of the highest in the world, along with people like Chris Webb, Greg Galloway, Deepak Puri, Darren Gosbell, and Marco Russo, all of whom are the reviewers of this book. If top people like them spend their precious time writing and reviewing this book, those of us who work on MDX should read it! It is a real treasure.

This is my favourite MDX book. Thank you Tomislav for writing it.

Vincent Rainardi, 2/9/2011.

Link to the book’s page on publisher’s web site: http://link.packtpub.com/UUrJM6

28 June 2011

Processing ROLAP Cube and ROLAP Dimension

Filed under: Analysis Services — Vincent Rainardi @ 9:41 pm
Tags:

Someone asked me today: he created a ROLAP cube with MOLAP dimension. He made some changes to the dim and did a process full on the cube. He expected to see the changes but he couldn’t see the changes when browsing the cube. He was wondering why. If he processed full the database he could see the changes when browsing the cube.

After trying out a few things myself, I responded to him. My response could be useful for somebody else so I share it here. I understand there are many AS experts out there and if any of below is wrong, I will be glad to learn from them.

Process full does not affect all objects. Process full affects only the object that we process.
If we do process full on a cube, then only the cube will be processed. The dimensions will not be processed.
If we do process full on a dimension, then only that dimension will be processed. The cube will not be processed.
To process everything (both cube and dimensions), we need to process the database.

Using ROLAP cube, if you add a row in your fact table, you do not need to process anything. You will automatically see the new row when you browse the cube.

But, in a cube there are measure groups (MG). And in a measure group there are partitions.
If you set the cube to ROLAP, but the MGs and partitions are still MOLAP, then you won’t see the changes in the fact table.
If you set the cube and the MGs to ROLAP but the partitions are still MOLAP, then you won’t see the changes in the fact table.

Because the partitions are MOLAP you need to process the cube to see the changes in the fact table.
It is the partition that determines the overall settings.
If you set the cube and the MGs to MOLAP but the partitions are in ROLAP, then you will see the changes in the fact table without processing anything.

Impact analysis:
If you do a process full on a dimension and do impact analysis, you get the partition, the MG and the cube.
If you do a process update on a dimension and do impact analysis, you don’t get anything. Which means that you don’t need to process the cube (in theory)
If you do a process full on a cube and do impact analysis, then you don’t get the dimensions.

As I said above, if your cube is ROLAP, and you change your fact table, you do not need to process anything. You will automatically see the changes when you browse the cube.
But, if your dimension is ROLAP, and you change the value on the dimension table, then you need to process the dimension to see the change.

You can’t see the changes you made on the dimension table if you don’t process the dimension.
You don’t need to process full the dimension. Process update or process data is enough to see the changes.

Going back to “Process full does not affect all objects”, if you want to process affected objects, then select “Change Settings” and choose “Process affected objects”. For example, when process full a dimension, you if you select “Process affected objects” you will also process the partition, MG and cube.

Going back to “Process full affects only the object that we process”, process full processes both data and index.
Process data processes data only. Process index processes index only.
But process full only processes the object we process, not all objects in the database.

Hope this helps. Vincent Rainardi, 28/6/2011.

24 April 2011

Where To Put an Attribute (SSAS)

Filed under: Analysis Services — Vincent Rainardi @ 6:28 am
Tags:

Suppose you have a cube and you need to add an attribute to the cube. Say that the cube structure is: 1 MG containing 1 measure, and 1 dim containing 1 attribute.
The physical table structure is: MG1 is from on table fact1, and dim1 is from table dim1.
You need to add attribute2 to this cube. Attribute2 is located on table dim2. Fact2 links Dim2 to Dim1.

Here we have 2 options:
1. Create Dim2 in AS, create MG2 and link MG2 to both Dim1 & Dim2
2. Join dim2 to dim1 on DSV (using Fact2), and put attribute2 on dim1

I found that, even though the processing is slightly longer, the query on 2 is quicker than 1. The query that I mean is: list down all attribute2 members for a particular attribute1 member. Something like:

select measures.measure1 on columns,
 dim1.attribute2.members on rows
 from cube
 where dim1.attribute1.&[member]

I remember also on one occasion we get performance benefit by moving an attribute to another dimension. Attrib1 was in dim1 and attrib2 was in dim2. The query: list down all possible combination of attrib1 and attrib2 was slow. Moving attrib1 to dim2 make the query quicker.

Please refer here for a discussion about this by Mosha. My previous article about Autoexist is here.

So, it’s better to put attributes which are frequently used together, in the same dimension.

I should add a note to that last sentence: as long as the grain of the dimension doesn’t change a great deal. What I mean is: if after moving an attribute into the dimension, the number members increases a lot (say from a million to a billion), it is probably better not moving the attribute.

30 March 2011

Ragged Hierarchy in SSAS

Filed under: Analysis Services — Vincent Rainardi @ 6:08 pm
Tags:

Someone asked me how to do a ragged hierarchy in SSAS. I thought it was “natural” in SSAS, i.e. using a “parent column” in the table then make that column a “parent attribute” in a “parent child” dimension. But then somebody else asked me the same question and I realised that the issue was that by default we would see the parent node as a child node like this:

Parent nodes are displayed as child nodes

To eliminate it we need to set the MembersWithData property to NonLeafDataHidden.

Here’s the complete scenario.

This is what we want to achieve:

This is the source table:

create table Company (CompanyKey int, ParentKey int, CompanyName varchar(50))
insert into Company (CompanyKey, ParentKey, CompanyName) values
(1, NULL, ‘Company1’), (2, 1, ‘Company2’), (3, 1, ‘Company3’), (4, 2, ‘Company4’), (5, 4, ‘Company5’),
(6, 1, ‘Company6’), (7, 6, ‘Company7’), (8, 7, ‘Company8’), (9, 7, ‘Company9’)
select * from Company

We add that table to the DSV (data source view) and create a self-relationship from ParentKey to CompanyKey:

Then we create a dimension:

Then on the Parent Key attribute we set 2 properties: on the Usage to Parent and the MembersWithData to NonLeafDataHidden:

And the result is as expected:

Let’s create a fact table, then see how it looks in Excel.

create table FactInvoice (FactInvoiceKey int, CompanyKey int, InvoiceAmount money)
insert into FactInvoice values (1,2,430), (2,2,120), (3,6,220), (4,6,250)
select * from FactInvoice

FactInvoice

Create a cube:

Cube Design

 

Browse in Excel:

Browse in Excel - Field ListBrowse in Excel - Columns and RowsBrowse In Excel

Company 1 is the parent of both Company 2 and 6.

 

27 February 2011

Parent Child Dimension: extra child node

Filed under: Analysis Services — Vincent Rainardi @ 8:03 am
Tags:

In a Parent Child Dimension (PDC), using the MembersWithData property we can see whether we want to display child nodes with the same parent name or not. For example, if we have a table like this:

We expect to see this: (1)

But by default Analysis Services produce this: (2)

The above screenshot shows that, by default, Analysis Services gives us 1 more child node, which is the parent.

To get (1), change the MembersWithData property from NonLeafDataVisible to NonLeaveDataHidden, like this:

We need to set this MembersWithData property on the parent key column, not on the dimension key column.

Thank you to Mourad Faiz for bringing this to my attention. As always I welcome corrections and discussion at vrainardi@gmail.com. Vincent 27/2/11.

29 December 2010

SSAS DMV: Join using DataSet

Filed under: Analysis Services — Vincent Rainardi @ 4:32 pm
Tags:

In SSAS 2008 we have Dynamic Management View (DMV), which is basically XMLA discover command, wrapped as data mining select statement (DMX). On SSMS, if we connect to an SSAS instance, right click on the server name, New Query, DMX,  type “select * from $system.discover_connections”, and click Execute, we will see a list of users currently connecting to that server.

The output is like this: (I only pick 1 row and transpose it so we can see the column names clearly)

CONNECTION_ID 42
CONNECTION_USER_NAME domain\user1
CONNECTION_IMPERSONATED_USER_NAME
CONNECTION_HOST_NAME 10.1.27.146:3315
CONNECTION_HOST_APPLICATION SQL Server Management Studio
CONNECTION_START_TIME 29/12/2009 11:34
CONNECTION_ELAPSED_TIME_MS 3089784
CONNECTION_LAST_COMMAND_START_TIME 29/12/2009 11:45:03
CONNECTION_LAST_COMMAND_END_TIME 29/12/2009 11:45:04
CONNECTION_LAST_COMMAND_ELAPSED_TIME_MS 62
CONNECTION_IDLE_TIME_MS 1972821
CONNECTION_BYTES_SENT 3610612
CONNECTION_DATA_BYTES_SENT 15880694
CONNECTION_BYTES_RECEIVED 1502120
CONNECTION_DATA_BYTES_RECEIVED 1502120

And if we type “select * from $system.discover_sessions” we will get:

SESSION_ID D1A8209F-27C4-4013-AB23-154704213CB3
SESSION_SPID 357640
SESSION_CONNECTION_ID 661
SESSION_USER_NAME domain\user1
SESSION_CURRENT_DATABASE Cube1
SESSION_USED_MEMORY 71
SESSION_PROPERTIES
SESSION_START_TIME 29/12/2009 11:57
SESSION_ELAPSED_TIME_MS 3193141
SESSION_LAST_COMMAND_START_TIME 29/12/2009 12:09
SESSION_LAST_COMMAND_END_TIME 29/12/2009 12:09
SESSION_LAST_COMMAND_ELAPSED_TIME_MS 0
SESSION_IDLE_TIME_MS 2476374
SESSION_CPU_TIME_MS 87687
SESSION_LAST_COMMAND {MDX Statement}
SESSION_LAST_COMMAND_CPU_TIME_MS 62
SESSION_STATUS 0
SESSION_READS 103571
SESSION_WRITES 0
SESSION_READ_KB 6593264
SESSION_WRITE_KB 0
SESSION_COMMAND_COUNT 1854

And when we type “select * from $system.discover_commands” we get:

SESSION_SPID 356127
SESSION_COMMAND_COUNT 31
COMMAND_START_TIME 29/12/2009 11:36
COMMAND_ELAPSED_TIME_MS 0
COMMAND_CPU_TIME_MS 281
COMMAND_READS 102
COMMAND_READ_KB 6154
COMMAND_WRITES 0
COMMAND_WRITE_KB 0
COMMAND_TEXT {MDX Statement}
COMMAND_END_TIME 29/12/2009 11:36

For other DMVs please see my June presentation at Microsoft TVP here. Also see my recent article on SSAS DMV here.

We can link the discover_sessions and discover_connections on CONNECTION_ID. And we can link the discover sessions and discover_commands on SESSION_SPID.  Like this:

If we want to join the three DMVs (command, session, connection), we have 3 options:

  1. Join using Linked Server
  2. Join using SSIS
  3. Join using DataSet

Midas Matelis wrote about the Linked Server. Basically it’s using OpenQuery.
Chris Webb wrote about the SSIS one.
This article is about using DataSet to join the DMVs. It’s using ADOMD.

What is a DataSet? We can think of DataSet as in-memory ERD diagram. It’s a collection of tables which have relational integrity between them.

So, we are going to issue Discover_Connections, Sessions and Commands then join them in the DataSet. The steps are:

  1. Add reference to ADOMDClient
  2. Open a connection and execute Discover_Commands
  3. Create a DataTable for Discover_Commands and populate it
  4. Execute Discover_Sessions
  5. Create a DataTable for Discover_Sessions and populate it
  6. Execute Discover_Connections
  7. Create a DataTable for Discover_Connections and populate it
  8. Add all three tables to the DataSet and join them
  9. Display the joined table to the screen

1. Add reference to ADOMDClient

Create a new C# project. You can use Windows form, WPF or web project. Or Silverlight project if you use VS2010. Add reference to Microsoft.AnalysisServices.AdomdClient

2. Open a connection and execute Discover_Commands

AdomdConnection ASConn = new AdomdConnection(“Data Source=ServerName;Catalog=;Integrated Security=SSPI;”);
ASConn.Open();
AdomdCommand Cmd = ASConn.CreateCommand();
Cmd.CommandText = “select * from $system.discover_commands”;
AdomdDataReader R = Cmd.ExecuteReader();

Leave the Catalog empty. We don’t need to specify it if we only want to execute discover command, session and connection. We do need to specify the database name for some DMV.

3. Create a DataTable for Discover_Commands and populate it

First we create a DataTable called tblCommand:
DataTable tblCommand = new DataTable(“Command”);

Then we get the column names from the schema:
DataTable tblSchema = new DataTable(“Schema”);
tblSchema = R.GetSchemaTable();
foreach (DataRow schemarow in tblSchema.Rows)
{ tblCommand.Columns.Add(schemarow.ItemArray[0].ToString(), System.Type.GetType(schemarow.ItemArray[5].ToString()));}

Then we populate the DataTable from the ADOMD Data Reader:
while (R.Read())
{object[] ColArray = new object[R.FieldCount];
for (int i = 0; i < R.FieldCount; i++)
{if (R[i] != null) ColArray[i] = R[i];}
tblCommand.LoadDataRow(ColArray, true);
}
R.Close();

4. Execute Discover_Sessions

We reuse the data reader:
Cmd.CommandText = “select * from $system.discover_sessions”;
R = Cmd.ExecuteReader();

5. Create a dataset table for Discover_Sessions and populate it

First we create a DataTable called tblSession:
DataTable tblSession = new DataTable(“Session”);

Then we get the column names from the schema:
tblSchema = R.GetSchemaTable();
foreach (DataRow schemarow in tblSchema.Rows)
{tblSession.Columns.Add(schemarow.ItemArray[0].ToString(), System.Type.GetType(schemarow.ItemArray[5].ToString()));}

Then we populate the DataTable from the ADOMD Data Reader:
while (R.Read())
{object[] ColArray = new object[R.FieldCount];
for (int i = 0; i < R.FieldCount; i++)
{if (R[i] != null) ColArray[i] = R[i]; }
tblSession.LoadDataRow(ColArray, true);
}
R.Close();

6. Execute Discover_Connections (reuse reader)
Cmd.CommandText = “select * from $system.discover_connections”;
R = Cmd.ExecuteReader();

7. Create a DataTable for Discover_Connections and populate it

First we create a DataTable called tblConnection:
DataTable tblConnection = new DataTable(“Connection”);
Get the column names from schema (tblSchema is defined above)
Column name is on the 1st column, data type is on the 6th column.
tblSchema = R.GetSchemaTable();
foreach (DataRow schemarow in tblSchema.Rows)
{ tblConnection.Columns.Add(schemarow.ItemArray[0].ToString(), System.Type.GetType(schemarow.ItemArray[5].ToString()));}
Then we populate the Connection DataTable
while (R.Read())
{ object[] ColArray = new object[R.FieldCount];
for (int i = 0; i < R.FieldCount; i++)
{ if (R[i] != null) ColArray[i] = R[i];}
tblConnection.LoadDataRow(ColArray, true);
}

Then we close the data reader and the connection because we have finished reading data from SSAS. After this we only play in memory.
R.Close();
ASConn.Close(true);
ASConn.Dispose();

8. Add all three DataTables to the DataSet and join them

Create a DataSet and add the DataTables:
DataSet DS = new DataSet();
DS.Tables.Add(tblCommand);
DS.Tables.Add(tblSession);
DS.Tables.Add(tblConnection);

Create a relationship between Session and Command on Session SPID:
DataColumn ParentColumn = DS.Tables[“Session”].Columns[“SESSION_SPID”];
DataColumn ChildColumn = DS.Tables[“Command”].Columns[“SESSION_SPID”];
DataRelation Rel = new DataRelation(“Session_Command”, ParentColumn, ChildColumn);
DS.Tables[“Command”].ParentRelations.Add(Rel);

Create a relationship betweeen Connection and Session on Coonection ID
ParentColumn = DS.Tables[“Connection”].Columns[“CONNECTION_ID”];
ChildColumn = DS.Tables[“Session”].Columns[“SESSION_CONNECTION_ID”];
DataRelation Rel2 = new DataRelation(“Connection_Session”, ParentColumn, ChildColumn);
DS.Tables[“Session”].ParentRelations.Add(Rel2);

Join the three tables: (David M provides a good reference, using array)

First we create a DataTable called tblJoin to store the final result:
DataTable tblJoin = new DataTable(“Join”);

Then we create the columns of tblJoin, by getting the column names and data types from tblConnection, tblSession and tblCommand. To avoid duplicate column name, add “2” on the second join column.

for (int i = 0; i < tblConnection.Columns.Count; i++)
{ tblJoin.Columns.Add(tblConnection.Columns[i].ColumnName,
     tblConnection.Columns[i].DataType);
}

for (int i = 0; i < tblSession.Columns.Count; i++)
{ if (!tblJoin.Columns.Contains(tblSession.Columns[i].ColumnName))
    tblJoin.Columns.Add(tblSession.Columns[i].ColumnName, tblSession.Columns[i].DataType);
  else
    tblJoin.Columns.Add(tblSession.Columns[i].ColumnName + "2", tblSession.Columns[i].DataType);
}

for (int i = 0; i < tblCommand.Columns.Count; i++)
{ if (!tblJoin.Columns.Contains(tblCommand.Columns[i].ColumnName))
    tblJoin.Columns.Add(tblCommand.Columns[i].ColumnName, tblCommand.Columns[i].DataType);
  else
    tblJoin.Columns.Add(tblCommand.Columns[i].ColumnName + "2", tblCommand.Columns[i].DataType);
} 

tblJoin.BeginLoadData();
foreach (DataRow ConnectionRow in tblConnection.Rows)
{ DataRow[] SessionRows = ConnectionRow.GetChildRows(Rel2);
  if (SessionRows != null && SessionRows.Length > 0)
  { object[] ConnectionArray = ConnectionRow.ItemArray;
    foreach (DataRow SessionRow in SessionRows)
    { DataRow[] CommandRows = SessionRow.GetChildRows(Rel);
      if (CommandRows != null && CommandRows.Length > 0)
      { object[] SessionArray = SessionRow.ItemArray;
        foreach (DataRow CommandRow in CommandRows)
        { object[] CommandArray = CommandRow.ItemArray;
          object[] JoinArray = new object[ConnectionArray.Length
            + SessionArray.Length + CommandArray.Length];
          Array.Copy(ConnectionArray, 0, JoinArray, 0, ConnectionArray.Length);
          Array.Copy(SessionArray, 0, JoinArray, ConnectionArray.Length, SessionArray.Length);
          Array.Copy(CommandArray, 0, JoinArray, ConnectionArray.Length +
          SessionArray.Length, CommandArray.Length);
          tblJoin.LoadDataRow(JoinArray, true);
        }
      }
    }
  }
}

7. Display the joined table to the screen

Here I used DataGridView object. If you want the AutoSizeColumnsMode can be set to cover the widest content.
dataGridView1.DataSource = tblJoin;

We can also display only the column that we like (not all columns).
We can put the server name in a text box so we can change SSAS server easily.
The result contains 48 columns: (the tblConnection columns are not shown)

The code is attached (C#): App1v2 (it’s pptx, rename to .zip first).
The zip file contains the Visual Studio solution (VS 2008).
Version 1 (only Session and Command, no Connection):  App1 (it’s pptx, rename to .zip first)

Potential for further development:

  • Rename columns on the dataGridView to be more user friendly. Hide unnecessary columns.
  • Refresh every 2 seconds
  • Cascade drill down: you click + on a connection and it displays the session, you click + on a connection and it displays the commands.
  • Execute other DMVs, such as displaying object resources (memory etc) used by each command or connection.
  • Give warning if the amount of resources used by a connection exceeds certain pre-set limits.

27 December 2010

Q&A on: Impersonation

Filed under: Analysis Services — Vincent Rainardi @ 10:05 am
Tags:

Q: What’s the difference between impersonation and connection string?

A: Let’s assume that the source database is in Oracle, running on UNIX. In BIDS, in the data source we have Connection String and Impersonation Information:

  1. Connection String: The credential specified here is the database user account (user ID and password) that Oracle will accept, allowing to read the rows on its tables.
  2. Impersonation Info: This the Windows credential/user account under which Analysis Services will be running when it tries to connect to that UNIX server to talk to Oracle.

For the number 1 above, we need to ask the Oracle DBA (or Sybase, Informix, DB2, etc depending on what the source RDBMS is) to create a database user account for us. If the RDBMS is running on Windows, this can be a Windows user account, if this account is authenticated by the RDBMS. If the source RDBMS is a SQL Server, then this can be a SQL Server login.

Number 2 must be a Windows account, because Analysis Services runs on Windows. Let’s say that our SSAS server is running (all the time) under a service account called Domain1\ServiceAccount1. In the example above, when SSAS connects to the UNIX server where the Oracle is running, we may want SSAS to connect using Domain1\ServiceAccount2 which has fewer privileges than ServiceAccount1. In this case ServiceAccount2 is the impersonation account.

Q: Are the connection string & impersonation in SSMS the same as the ones in BIDS?

A: Yes, they are the same. BIDS is just a tool to define connection string & impersonation. On the BIDS’ Solution Explorer, when we right click on Data Source, we are defining the connection string and the impersonation information. When we deploy the AS DB, these 2 things are copied across (implemented) to the AS server. That’s what we see in SSMS.

Q: Can we process a cube from a SQL server source using SQL Server authentication?

A: No, we can’t “process” a cube using a SQL Server account. SSAS needs a Windows account to impersonate when processing a SSAS database. “Process” doesn’t mean just connecting to the data source. “Process” means connect, read data, populate the dimensions and cubes, calculate aggregate and indexes, etc.

So yes, you can connect to the SQL Server source database using SQL Server authentication, but you can’t process the cube using a SQL Server account.

Q: If we can access the source database using connection string, why do we need impersonation?

A: If you use user name and password in the connection string, then they are used to connect to the source database. The impersonation account is not only used to connect to the source relational database, but to maintain the AS OLAP DB.

Q: Why can’t I use my Windows user account for impersonation?

A: You can use your Windows account, but there is a disadvantage when the cube is migrated to Production. Because a) if you use your user account to connect to the relational DB, other developers can’t maintain the BIDS project, and b) when the project is deployed to Production it won’t run because your account has access to the dev database, but not to the prod database. Using service account solves these problems. Impersonation is to specify which account to be used by the cube to connect to the relational database.

Analysis Services is a Windows service, running as SSAS service account. (to view this, double click the SSAS Service from control panel, then click on the Log On tab, see screenshot below).

The SSAS service account does not necessarily have access to the database. Hence the concept of impersonation: at run time the SSAS service account pretend to be (impersonate) or uses the impersonation account to access the database.

Q: How many modes of impersonation are there and what are the differences between them?

A: There are four impersonation modes:

a) Use a specific Windows user name & password: a domain user account created specifically for this purpose. This is the best practice because all the impersonation account need is a read access to the source database.

b) Use the service account: the SSAS service account usually set as a powerful domain account which has access to multiple SQL Servers. Using the SSAS service account gives you access to many databases in many SQL Server. You don’t have to worry about which account have access to which databases as a) above

c) Use the credential of the current user
SSAS service will use your windows account to access the database. When the cube is under development this is handy if you have access to the dev database. This way you can start the development straight away, with out waiting for any account to be setup. If you handover the BIDS project to your colleague developer however, your colleague needs to have access to that source database too. Otherwise he can’t process the cube.
Note that if you do this, when the cube is moved to production, you will need to change the account to the proper production account like a) above, or use the service account like b) above. As a developer, your Windows account does not normally have access to production databases. I’ll be surprised if it does because it creates a security issue.

d) Inherit
SSAS will use whatever specifies on the Database Properties. To view the Database Properties used for this, in SSMS, right click a SSAS database and choose Properties. Under Security Settings, there’s Data Source Impersonation Info. This is what Inherit uses. See screenshot below.

25 December 2010

Who Are Using The Cubes Now?

Filed under: Analysis Services — Vincent Rainardi @ 1:26 pm
Tags:

To find out who are accessing the cubes right now, there are several ways:

  1. Query the SSAS query log table
  2. Use Activity Viewer application
  3. Query the Discover_Connection DMV (or the schema rowset XMLA)

1. Query log table

select MSOLAP_Database, max(StartTime) as last_access_time,
MSOLAP_User, count(*) as hits, sum(duration) as duration
from OlapQueryLog
where MSOLAP_User is not null –system/internal
and StartTime >= getdate()-0.5/24 –in the last 1/2 hour
and MSOLAP_User <> ‘Domain\MyUser’ –exclude me
and MSOLAP_User <> ‘Domain\ServiceUser’ –exclude service account
group by MSOLAP_Database, MSOLAP_User
order by MSOLAP_Database desc, max(StartTime) desc

Note:

  1. We need to activate the query log first. See William E. Pearson III article here and Edward Melomed article here, both on how to configure SSAS Query Log.
  2. If you do enable the query log on a busy production machine, I would advise to archive the entries to a “usage mart” (see here), and truncate the table every day/week.
  3. To see individual log entry remove the group by clause.
  4. Too see all queries by a particular user:
    select * from OlapQueryLog where MSOLAP_User = ‘Domain\User1’ order by StartTime desc
  5. To find out what was the user executing on #4 above, look at the Dataset column. It is a series of 0 and 1 with several commas, like this:
    0100,00000,010,00,000
    Which means: the measure group has 5 dimensions (the group of numbers separated by commas). The query is using dim1 and dim3 (the one with 1 on them). Dim1 has 4 attributes, dim2 has 5 attributes, dim3 has 3 attribute, dim4 has 2 attributes and dim5 has 3 attributes.
  6. To know what dim1 and dim2 are in #5 above, look at the cube structure tab in BIDS. That’s the order. Alternatively you can script the cube to XMLA and look under Cube Dimension section. Not the database dimension but the cube dimension. I should mention that sometimes the order of the dimension in the Query Log doesn’t match the cube dimension order. I have seen cases where the order of dims in cube dimension are: 1,2,3,4,5,6,7,…,20,21,22 but the order in the Dataset column in the query log is: 1,2,3,4,5,6,9,11,12,13,14,8,10,15,16,17,18,19,20,7,21,22. So I would advice to test: query the cube with just 1 dim on the row and compare the value on the dataset column.
  7. In #5 above, if all attributes in the dim will have the value of 1, it means that one of the attribute on that dim has a default value. For example:
    0100,00000,010,00,111111
    Dim 5 has all attributes = 1, which means that 1 (or more) attribute on Dim5 has a default value.
  8. The number of digit in the attribute count in #5 above includes hidden attributes and disabled attributes, but excludes the hierarchies. For example: Dim product has 3 attribute. Attribute 1 is disabled (enabled = false), attribute 2 is hidden (visible = false) and attribute 3 is active. The number of digits for dim product is 3, not 2 or 1, i.e. 000, not 00 or 0. If dim product has a hierarchy, the number of digits is still 3, i.e. 000 not 0000.

2. Use Activity Viewer application

Activity Viewer is an application that enables us to see what is currently happening the SSAS server, i.e. users, sessions, queries, I/O and CPU. It is written by Luis Ballesteros, Alli Curley, Malini Jagannadhan, Meera Srinivasan, and others. The 2008 version is here. The 2005 version is here.

What I describe here is for the 2008 version. When started for the first time, it asks for the SSAS server name. This server name is stored in an XML file in the same folder as the exe. We can add more than 1 server.

Then click on “Users” on the left. This displays who’s currently connecting to the SSAS server. If a user has high memory or CPU, click on that user then click on Details at the bottom. This displays the session details. Scroll to the right to find ‘Session Last Command’ column, which contains the MDX executed by that user. Unfortunately we can’t copy the MDX.

If we want to terminate a session (I usually combine with OLAP Heart Beat), click on Cancel Session on the bottom right. Click ‘Back’ to go back to the user list.

3. Query the Discover_Connection DMV

Open a DMX window and type “select * from $system.discover_connections”

For details see SSAS DMV.

Comparing Oracle 11g OLAP Option and SSAS

For my next DWBI project, I’m thinking about implementing Oracle 11g OLAP option instead of SSAS, so I’m comparing them.

  1. Browsability: users can use OBI (e.g. Answers), BO (since 9i AW/2003), Excel or other tool to browse Oracle OLAP. Users are not expected to use SQL to query the cube. In AS now we have Crescent to look forward to. In the mean time we could use Tableau (see Jen’s session in SQLBits 8), Strategy Companion, CubePlayer or Excel 2010.
  2. Oracle OLAP cubes are stored as true multidimensional store (I’m interpretting this as compressed MOLAP), not as relational tables with materialized view/MV. This MOLAP storage solves the classic ROLAP problem of “MV aggregates can’t cover every permutation of every member of every attribute against other members of other attributes”.
  3. Oracle mentions compression & cost-based aggregation (see page 7). SSAS is certainly compressed, but I’m not sure if SSAS aggregations are cost-based.
  4. Oracle OLAP has the ability to blend multidimensional and SQL models within the same query (see Bud’s comments below). This “blend concept” is similar to SQL Server’s BISM concept in Denali (2011). BISM can expose itself as multidimensional interface (MDX) or SQL/relational interface (DAX). The difference is that BISM is relational (tables), with “in-memory column” (VertiPaq), where as 11g’s OLAP is compressed MOLAP.
  5. Many-to-many relationship. If you are not familiar with M2M in SSAS, pls read Marco’s paper about M2M Revolution. This is critical for many business situations. Can’t live without it. Have searched the whole library and I couldn’t find M2M in 11g OLAP. Bud said we can do it (see below), and from what I understand here, this is done by simply mapping a base measure to the fact table containing the M2M relationship, just like the way we create any base measure. Not sure how do we hide the measure though (because the measure is dummy, it is only used to establish M2M relationship between 2 dimensions).
  6. Yes 11g OLAP Option can create calculated measures. But as we in SSAS world know, calculated measure such as YTD or growth is not a problem if it’s only 1 or 2 measures, but if we have 50 measures we use date tool aka date calculation dimension (see Chris, Alberto & Marco’s most useful SSAS book here, chapter 6). Seems that we can’t do this in 11g OLAP. Surely we can’t create 50 * 10 calc measures? Well we can, but how about the usability?
    Update 27/12: I have read Bud’s comment below, and I know we can create “measure dim” and “calculation dim”, but in SSAS we need to create a hidden measure and write some “MDX calculations” to establish how the growth/2 yr growth, etc will be calculated. Have read about creating calc measure on 11g here, but couldn’t find how that “MDX calculation” is done in Oracle OLAP. Could anybody point me a direction?
  7. Complex calculation/MDX, i.e. existing (results depends on what’s on the slicer). Hidden named sets which is used by a calc member, etc. How do we do that in 11g OLAP?
    Update 27/11: have explored Simba’s MDX Provider for Oracle OLAP (link, link, link) and could not find comprehensive OLAP/MDX functions such as this. It seems that Simba’s product only provide connectivity, not ablity to do complex calculations. Would be grateful if somebody can enlighten me (list of Simba’s MDX functions for Oracle 11g OLAP). Have also looked at Bizzantz’s DeltaMaster methods, but it’s far below the richness of AS’ MDX e.g. scoping, existing, etc.
  8. MDX interface. 11g OLAP cube does mention MDX interface to OBIEE, but I expect this is basic “select … on columns, … on rows from cube” without the ability to use MS’ MDX functions such as NECJ, parallel period, etc. See page 14.
    Update 27/11: same comment as point 7. Update 28/11: see OLAP DML Functions here.
  9. Measure expression. This is required for example for currency conversion. If we don’t have measure expression in 11g OLAP (and we don’t have M2M either), how do we do currency conversion?
    Update 27/11: we can do this in 11g OLAP using “calculated measure” (see here), just need to specify the correct “calculation type” and “calculation”. It is quite comprehensive functions with regards to period / time series computation, but outside period I’m not sure. Update 28/11: found it. It’s called OLAP DML Functions, see here. It’s very comprehensive.
  10. Dimensional/cell security. In AS I sort of take it for granted that group1 can only see member1 whereas group2 can only see member2, along with giving users different default members based on role. But in 11g OLAP?
    Update 28/11: as Bud said, yes we can do dimensional security in Oracle OLAP, see “Selecting data by criteria” here.
  11. In AS we can control the incremental cube processing, i.e. process changed partition only etc. It is mentioned in 11g OLAP this can be automated (see the end on page 28 re Fast cube refresh), and that they can update incrementally by reading the log files (see last paragraph of page 6). This is a good thing (compared to AS).

Oracle OLAP blog is here. The Wiki is here. 11g OLAP user guide is here. OLAP Option documentation is here.

If you find something incorrect above, I’d be grateful if you could let me know, either via comments or vrainardi@gmail.com. I will correct them. Thanks again for Bud Eggress for your corrections.

Next Page »

Blog at WordPress.com.