Data Platform and Data Science

27 July 2010

MDX: Except

Filed under: Analysis Services,MDX — Vincent Rainardi @ 1:52 pm
Tags: ,

Background

It’s a stock brokerage firm. They do some promotion campaigns and would like to know how many customer accounts in each campaign never bought or sold a certain stock. They would like to have this as a calculated measure in the cube.

Let’s do it in SQL first, then in MDX. This way we know what the result should be, so we can verify the MDX output.

So the article will flow like this:

  1. Create SQL tables. Do this as simple as possible. Only create the neccesary columns.
  2. We populate these tables with data.
  3. Write a SQL query which gives us the count of customer accounts in each campaign that never bought or sold a certain stock (say ‘stock1’)
  4. Create the SSAS cube from those tables.
  5. Write an MDX query which gives us the count of customer accounts in each campaign that never bought or sold stock1.
  6. Explain the MDX

If you are looking only for the MDX just scroll down to the end of this article.

1. Create the SQL tables

Let’s create the table:

— Create tables
create table dim_campaign
( campaign nvarchar(10) )
— simplified (no media, target, design)

create table dim_account
( account nvarchar(10) )
— simplified version (no name, number, customer details)

create table dim_stock
( stock nvarchar(10) )
–simplified (no symbol, company name, sector, segment, cap)

create table fact_campaign
( campaign nvarchar(10),
account nvarchar(10),
hit int
)
— simplified version (no run date, duration & budget)

create table fact_transaction
( account nvarchar(10),
stock nvarchar(10),
qty int
)
— simplified (no dates, price, amount)

I simplified every table. Only creating columns which were absolutely necessary.

2. Populate the tables

— Dim Campaign
insert into dim_campaign values (‘campaign1’)
insert into dim_campaign values (‘campaign2’)

— Dim Account
insert into dim_account values (‘account1’)
insert into dim_account values (‘account2’)
insert into dim_account values (‘account3’)
insert into dim_account values (‘account4’)

— Dim Stock
insert into dim_stock values (‘stock1’)
insert into dim_stock values (‘stock2’)

— Fact Campaign
insert into fact_campaign values (‘campaign1’, ‘account1’)
insert into fact_campaign values (‘campaign1’, ‘account2’)
insert into fact_campaign values (‘campaign2’, ‘account3’)
insert into fact_campaign values (‘campaign2’, ‘account4’)

— Fact Transaction
insert into fact_transaction values (‘account1’, ‘stock1’,100)
insert into fact_transaction values (‘account2’, ‘stock1’,100)
insert into fact_transaction values (‘account3’, ‘stock1’,100)
insert into fact_transaction values (‘account2’, ‘stock2’,100)
insert into fact_transaction values (‘account3’, ‘stock2’,100)
insert into fact_transaction values (‘account4’, ‘stock2’,100)

Now let’s query them:

select * from dim_campaign
select * from dim_account
select * from dim_stock
select * from fact_campaign
select * from fact_transaction

3. Write the T-SQL query:

So the tables are all populated now. We need to find out how many accounts in each campaign that never bought or sold stock1.

select distinct account from fact_transaction where stock <> ‘stock1’

Output: account2, account3, account4

This is not what we want. Account2 and account3 have purchased stock1.
What we want is account4 only.

So looking at the fact_campaign table, campaign1 is for account1 and 2, and campaign2 is for account3 & 4:

So how many accounts in each campaign that never bought or sold stock1?
Campaign1   0
Campaign2   1   (account4)

That’s the answer. Now let’s write the T-SQL query, then the MDX query.

select distinct account from fact_transaction
where account not in (select distinct account from fact_transaction where stock = ‘stock1’)
Output: account4

Good, that’s heading to the right direction. Now bring-in the campaign:

select distinct FT.account, FC.campaign
from fact_transaction FT
join fact_campaign FC on FC.account = FT.account
where FT.account not in (select distinct account from fact_transaction where stock = ‘stock1’)

Output:

So far so good. Now let’s list down all campaign, then join to the above output.

select DC.campaign, count(x.account) as count_of_accounts
from dim_campaign DC
left join
( select distinct FT.account, FC.campaign
from fact_transaction FT
join fact_campaign FC on FC.account = FT.account
where FT.account not in (select distinct account from fact_transaction where stock = ‘stock1’)
) x on x.campaign = DC.campaign
group by DC.campaign

Output:

In the above SQL, we do “group by and count”, because the requirement is to find out how many accounts in each campaign that never bought or sold stock1.

So that’s the SQL query. Now let’s do the MDX. But before that we need to create the cube first.

4. Create the cube

Data source view:

Cube structure:
(as the dimensions are straight forward, I used the cube wizard to create the dimensions)

Define many-to-many relationship:

Browse the cube:

5. Write the MDX

Now we write the MDX:

with member [Measures].[X] as
distinctcount
( nonemptycrossjoin
  ( { [Measures].[Qty] },
    { except
      ( [Dim Stock].[Stock].[Stock].Members,
        [Dim Stock].[Stock].CurrentMember
      )
    },
    { [Dim Account].[Account].[Account].Members }
  )
)
select X on columns,
non empty [Dim Campaign].[Campaign].[Campaign].members on rows
from [Campaign]
where [Dim Stock].[Stock].&[stock1];

Output:

This is not what we are looking for, because account1, account2 and account3 did bought stock1. The question was: how many accounts in each campaign that never bought or sold stock1? So for campaign1 (account 1 & 2) the answer is 0, and for campaign2 (account 3 & 4) the answer is 1 (only account4).

Here’s the correct MDX:

with
set set1 as
nonempty
( [Dim Account].[Account].[Account].Members,
  [Dim Stock].[Stock].&[stock1]
)
member x as
count
( except
  ( nonempty
    ( [Dim Account].[Account].[Account].Members,
      ( [Dim Campaign].[Campaign].CurrentMember,
        [Measures].[Qty]
      )
    ), set1
  )
)
select {x} on columns,
non empty [Dim Campaign].[Campaign].[Campaign].members on rows
from [Campaign]

Output:

6. Explanation

set1 is the accounts which have bought/sold stock1, i.e. {account1, account2, account3} as shown below:

with set set1 as
nonempty
( [Dim Account].[Account].[Account].Members,
  [Dim Stock].[Stock].&[stock1]
)
select set1 on columns
from [Campaign]

The nonempty inside the except is to get the accounts for each campaign:

with set x as
nonempty
( [Dim Account].[Account].[Account].Members,
  ( [Dim Campaign].[Campaign].CurrentMember,
    [Measures].[Qty]
  )
)
select {x} on columns,
non empty [Dim Campaign].[Campaign].[Campaign].members on rows
from [Campaign]

Output:

Except takes a way member from a set. For example, campaign2 is for account 3 & 4. So if we take away account1,2&3 from account3&4 what’s left is account4:

with set x as
except
( { [Dim Account].[Account].[Account].[Account3],
    [Dim Account].[Account].[Account].[Account4]
  },
  { [Dim Account].[Account].[Account].[Account1],
    [Dim Account].[Account].[Account].[Account2],
    [Dim Account].[Account].[Account].[Account3]
  }
)
select {x} on columns
from [Campaign]

Output:

The complete MDX is basically saying:

  1. Create set1 = all account which have bought or sold stock1
  2. Find out accounts for each campaign using nonempty
  3. Take away 1 from 2 using except
  4. Count the members in that set
with
set set1 as
nonempty
( [Dim Account].[Account].[Account].Members,
  [Dim Stock].[Stock].&[stock1]
)
member x as
count
( except
  ( nonempty
    ( [Dim Account].[Account].[Account].Members,
      ( [Dim Campaign].[Campaign].CurrentMember,
        [Measures].[Qty]
      )
    ), set1
  )
)
select {x} on columns,
non empty [Dim Campaign].[Campaign].[Campaign].members on rows
from [Campaign]

6 Comments »

  1. Hi Vincent,

    Great post. But I have a tiny correction on your explanation about set1. You’re saying set1 = all account which never bought or sold stock1, actually is the other way around. set1 = all account which bought or sold stock1.

    Cheers,
    Uzzie

    Comment by Uzzie — 28 July 2010 @ 5:56 am | Reply

  2. Thank you for the post. However, this solution is based on 2 Many to Many dimension relationships. What happens if your 2 measure groups are 500 million records each? Would you recommend this solution or look at an alernative?

    Comment by Robert — 23 May 2011 @ 4:50 pm | Reply

    • Thank you for your comment, Robert. I don’t know what will happen if there are 500m rows on the MGs, but I would try to use this solution first, to see the performance. If it’s too slow then I’d look for other ways. To be honest I don’t know any alternative to find out how many accounts in each campaign that never bought or sold stock1 (generic form: how many dim2 members for each dim1 member that has no transaction with dim3 members). This method is the only method I know (in multi dim world that is). If you know an alternative way I would be grateful if you could share it.
      I guess we could try optimising “the SQL way” (relational world) that I showed in the article, e.g. index, indexed/materialised view, partitioned index, etc. One thing that we can try when having a performance issue on many-to-many in AS is to materialise it. But I found that sometimes when we materialise the MG, the query results (the numbers) could be different.

      Comment by Vincent Rainardi — 23 May 2011 @ 7:22 pm | Reply

  3. I agree in the mdx\dimensional world the use of M2M is really the only way to answer the question that you are asking. However, my comment is more geared around if a user does not want to look at the data by the dimensions related to the second measure group. In this case I have gone down the path of creating a singular fact table that provides a measure for the values needed. Now, on first guess this seems limited, but this view can now be exposed to the relational user as well and not just the mdx user.
    I have posed other questions on how to determine what is used to translate the underlying relationship in the M2M. So, if we have 2 MGs and 15 dimensions and we share 5 of the dimensions. Are all 5 of the shared dimensions used in the M2M relationship to work out the intermediate values?

    Thanks

    Comment by Robert — 23 May 2011 @ 7:57 pm | Reply

    • Thank you for sharing it Robert. About your other question, I was asking the same questions myself, e.g. which dim(s) & which measure(s) does M2M use. I don’t know the answer (yet) but I found that M2M is not always right. I wrote an article about it but didn’t get a chance to finish it. I will find the article, find the answer(s), and share it with you.

      Comment by Vincent Rainardi — 24 May 2011 @ 7:29 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to Uzzie Cancel reply

Blog at WordPress.com.