Data Warehousing, BI and Data Science

30 July 2011

Bridge Table with Date

Filed under: Data Warehousing — Vincent Rainardi @ 3:06 am
Tags:

A bridge table is a fact table connecting 2 dimensions. A bridge table is usually used in data warehousing for multi valued attributes, see here. A bridge table has no measure (factless). Usually a bridge table contains only 2 columns, both are dim key columns, connecting the 2 dimensions (see my previous article about Bridge Table here*). But there are some cases where we have the 3rd column, which points to the 3rd dim. And this 3rd dimension contains the date. Hence the title of this article Bridge Table with Date.

*Also see my SQLBits Brighton presentation here, page 35 about bridge table.

As usual, it is easier to understand using an example. It’s a bank. A retail bank. Every 3 months or so they conducted a cross sell opportunity review. In the review they identify (programmatically) which products might be suitable for a customer. Products which the customer does not currently have. Hence the name “cross selling” opportunity, or CSO. The results of this CSO identification software are then reviewed manually by the customer service officers and they adjusted/overwritten the results. The final output is then sent to the bank’s data warehouse for storage and reporting.

Let’s have a look at the table structure:

create table dim_customer (customer_key int, customer_name varchar(50))
create table dim_cso (cso_key int, cso_product varchar(50)) --cross selling opportunity
create table dim_review (review_key int, review_number int, review_date date) --a review is conducted across all customers in the bank, about once a quarter
create table bridge_customer_review (customer_key int, review_key int, cso_key int)

insert into dim_customer values (1,'Customer1'), (2,'Customer2')
insert into dim_cso values (1,'Mortgage'), (2,'Investment'), (3,'Mortgage, Investment'), (4,'Credit Card'), (5,'Mortgages, Credit Card'), (6,'Investment, Credit Card'), (7, 'Mortgage, Investment, Credit Card')
insert into dim_review values (1, 1, '2011-04-29'), (2, 2, '2011-07-29')
insert into bridge_customer_review values (1,1,1), (1,1,2), (1,2,1), (1,2,2), (1,2,4), (2,1,1),(2,2,1),(2,2,2)

select * from dim_customer
select * from dim_cso
select * from dim_review
select * from bridge_customer_review

And these are the questions that the table structure was designed to answer:

-- Q1. How many customers have Investment cross selling opportunity?
select count( distinct customer_key) from bridge_customer_review where cso_key = 2
-- Output: 2

-- Q2. List of customers having Investment cross selling opportunity:
select distinct C.customer_name
from bridge_customer_review B
join dim_customer C on C.customer_key = B.customer_key
join dim_cso O on O.cso_key = B.cso_key
where O.cso_product = 'Investment'
/* Output:
Customer1
Customer2
*/

-- Q3. What are the CSOs for Customer1?
select B.review_key, O.cso_product from
( select customer_key, review_key, sum(cso_key) as cso_key from bridge_customer_review
  group by customer_key, review_key
) B
join dim_cso O on O.cso_key = B.cso_key
join dim_customer C on C.customer_key = B.customer_key
where C.customer_name = 'Customer1'
/* Output:
instance_dd cso_product
1  Mortgage, Investment
2  Mortgage, Investment, Credit Card
*/

-- Q4. For Customer1 how many reviews are Mortgage, Investment, Credit Card?
select count(*) from
( select customer_key, review_key, sum(cso_key) as cso_key from bridge_customer_review
  group by customer_key, review_key
) A
join dim_cso O on O.cso_key = A.cso_key
join dim_customer C on C.customer_key = A.customer_key
where C.customer_name = 'Customer1' and O.cso_product = 'Mortgage, Investment, Credit Card'
-- Output: 1

--Q5. In the July review, which customers get cross selling recommendation of Mortgage and Investment?
select count(*) from
( select customer_key, review_key, sum(cso_key) as cso_key from bridge_customer_review
  group by customer_key, review_key
) A
join dim_cso O on O.cso_key = A.cso_key
join dim_customer C on C.customer_key = A.customer_key
join dim_review R on R.review_key = A.review_key
where R.review_date = '2011-07-29' and C.customer_name = 'Customer1'
and O.cso_product = 'Mortgage, Investment'
-- Output: 1

In the bridge table above, we have a customer key column and a CSO key column. The customer key column points to the customer dimension as usual, whereas the CSO key column points to the Cross Selling Opportunity dimension, which contains recommended cross sell products. The bridge table essentially is a factless fact table connecting these 2 dimensions. It contains which products are recommended for each customer for cross selling.

But in the bridge table, we also have the review key column, which points to the review dimension. This review dimension has a date, enabling us to find out in which review the CSO product recommendation originated. Hence the title of the article, Bridge Table with Date.

Also notice that in the CSO dimension, I applied the “1,2,4,8 keys” technique*, which I used here. The purpose of using “1,2,4,8 key” technique is so that the total is unique, for example:

  • 1+2 is 3 and 3 is not there in the set (we only have 1,2,4,8)
  • 1+4 is 5 and 5 is not there in the set

In the CSO dim it is optional to have rows for CSO key = 3,5,6,7. Without these rows, we can still decode the combination, like this:

To decode 3 (the answer is 1+2), we find out first what is the largest number in the (1,2,4,8) set which is <= 3. The answer is 2. The we find the largest number in the set which is <= 3-2. The answer is 1. So 3 is 2+1.

To decode 5 (the answer is 1+4), we find out first what is the largest number in the (1,2,4,8) set which is <= 5. The answer is 4. Then we find out what is the largest number in the set which is <= 5-4. The answer is 1. So 5 is 4+1.

But, it is a lot easier, if we have the rows for CSO key = 3,5,6,7 because then we can run queries like Q4 where we have this clause: “and O.cso_product = ‘Mortgage, Investment, Credit Card’” rather than trying to decode the product combination like above (mortgage + investment + credit card 1 + 2+ 4 = 7).

*binary keys is probably a better name than “1,2,4,8 keys”.

Vincent Rainardi, 30th July 2011

Advertisements

7 Comments »

  1. Thanks for writing this.

    Can the table structure designed answer all the above questions using SSAS Cube by Slice and Dice?

    Regards,
    Sandesh

    Comment by Sandesh Nagaraj — 8 March 2012 @ 8:58 pm | Reply

  2. Hi,
     
    I am still find it difficult to figure out answers to the above questions using SSAS Cube.
     
    I have created dim_cso, dim_review and review_key as dimension tables and bridge_customer_review as factless fact table.
     
    What must be the “Bridge Customer Review Count” in measure it is “Sum” or “count of Rows” ?
     
    Please advice…
     
    Regards,
    Sandesh

    Comment by mdxdna — 25 March 2012 @ 6:47 am | Reply

    • Hi Sandesh,
      the AggregateFunction is DistinctCount. We need to create a separate MG for this measure (right click on Bridge Customer Review Count, New Measure, Usage: Distinct count, source table: bridge_customer_review, source column: any of the 3.
      MDX to query:
      select [Measures].[Customer Key Distinct Count] on 0
      from [bridge]
      where [Dim Cso].[Cso Key].&[2];
      Output: 2

      Comment by Vincent Rainardi — 25 March 2012 @ 4:32 pm | Reply

  3. thanks for replying. but how do we decode/find for customers with multiple product (Q3. What are the CSOs for Customer1? i.e CSO key = 3,5,6,7). I need to use the “sum” in this case right? Please advice…

    Comment by mdxdna — 26 March 2012 @ 5:48 am | Reply

    • You can use either count or distinct count Sandesh. In the examples below Customer Key Distinct Count is using distinct count aggregation, and Bridge Customer Review Count is using count aggregation.

      select [Measures].[Customer Key Distinct Count] on columns,
      nonemptycrossjoin
      ( [Dim Customer].[Customer Name].[All].[Customer1],
      [Dim Cso].[Cso Product].[All].Children
      ) on rows
      from Cube;
      Output:
      Customer1 Credit Card 1
      Customer1 Investment 1
      Customer1 Mortgage 1

      select [Measures].[Bridge Customer Review Count] on columns,
      nonemptycrossjoin
      ( [Dim Customer].[Customer Name].[All].[Customer1],
      [Dim Cso].[Cso Product].[All].Children
      ) on rows
      from Cube;
      Customer1 Credit Card 1
      Customer1 Investment 2
      Customer1 Mortgage 2

      Comment by Vincent Rainardi — 29 March 2012 @ 6:45 am | Reply

  4. Thanks Vincent

    Comment by mdxdna — 29 March 2012 @ 9:19 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: