A bridge table is a fact table connecting 2 dimensions. A bridge table is usually used in data warehousing for multi valued attributes, see here. A bridge table has no measure (factless). Usually a bridge table contains only 2 columns, both are dim key columns, connecting the 2 dimensions (see my previous article about Bridge Table here*). But there are some cases where we have the 3rd column, which points to the 3rd dim. And this 3rd dimension contains the date. Hence the title of this article Bridge Table with Date.
*Also see my SQLBits Brighton presentation here, page 35 about bridge table.
As usual, it is easier to understand using an example. It’s a bank. A retail bank. Every 3 months or so they conducted a cross sell opportunity review. In the review they identify (programmatically) which products might be suitable for a customer. Products which the customer does not currently have. Hence the name “cross selling” opportunity, or CSO. The results of this CSO identification software are then reviewed manually by the customer service officers and they adjusted/overwritten the results. The final output is then sent to the bank’s data warehouse for storage and reporting.
Let’s have a look at the table structure:
create table dim_customer (customer_key int, customer_name varchar(50)) create table dim_cso (cso_key int, cso_product varchar(50)) --cross selling opportunity create table dim_review (review_key int, review_number int, review_date date) --a review is conducted across all customers in the bank, about once a quarter create table bridge_customer_review (customer_key int, review_key int, cso_key int) insert into dim_customer values (1,'Customer1'), (2,'Customer2') insert into dim_cso values (1,'Mortgage'), (2,'Investment'), (3,'Mortgage, Investment'), (4,'Credit Card'), (5,'Mortgages, Credit Card'), (6,'Investment, Credit Card'), (7, 'Mortgage, Investment, Credit Card') insert into dim_review values (1, 1, '2011-04-29'), (2, 2, '2011-07-29') insert into bridge_customer_review values (1,1,1), (1,1,2), (1,2,1), (1,2,2), (1,2,4), (2,1,1),(2,2,1),(2,2,2) select * from dim_customer select * from dim_cso select * from dim_review select * from bridge_customer_review
And these are the questions that the table structure was designed to answer:
-- Q1. How many customers have Investment cross selling opportunity? select count( distinct customer_key) from bridge_customer_review where cso_key = 2 -- Output: 2 -- Q2. List of customers having Investment cross selling opportunity: select distinct C.customer_name from bridge_customer_review B join dim_customer C on C.customer_key = B.customer_key join dim_cso O on O.cso_key = B.cso_key where O.cso_product = 'Investment' /* Output: Customer1 Customer2 */ -- Q3. What are the CSOs for Customer1? select B.review_key, O.cso_product from ( select customer_key, review_key, sum(cso_key) as cso_key from bridge_customer_review group by customer_key, review_key ) B join dim_cso O on O.cso_key = B.cso_key join dim_customer C on C.customer_key = B.customer_key where C.customer_name = 'Customer1' /* Output: instance_dd cso_product 1 Mortgage, Investment 2 Mortgage, Investment, Credit Card */ -- Q4. For Customer1 how many reviews are Mortgage, Investment, Credit Card? select count(*) from ( select customer_key, review_key, sum(cso_key) as cso_key from bridge_customer_review group by customer_key, review_key ) A join dim_cso O on O.cso_key = A.cso_key join dim_customer C on C.customer_key = A.customer_key where C.customer_name = 'Customer1' and O.cso_product = 'Mortgage, Investment, Credit Card' -- Output: 1 --Q5. In the July review, which customers get cross selling recommendation of Mortgage and Investment? select count(*) from ( select customer_key, review_key, sum(cso_key) as cso_key from bridge_customer_review group by customer_key, review_key ) A join dim_cso O on O.cso_key = A.cso_key join dim_customer C on C.customer_key = A.customer_key join dim_review R on R.review_key = A.review_key where R.review_date = '2011-07-29' and C.customer_name = 'Customer1' and O.cso_product = 'Mortgage, Investment' -- Output: 1
In the bridge table above, we have a customer key column and a CSO key column. The customer key column points to the customer dimension as usual, whereas the CSO key column points to the Cross Selling Opportunity dimension, which contains recommended cross sell products. The bridge table essentially is a factless fact table connecting these 2 dimensions. It contains which products are recommended for each customer for cross selling.
But in the bridge table, we also have the review key column, which points to the review dimension. This review dimension has a date, enabling us to find out in which review the CSO product recommendation originated. Hence the title of the article, Bridge Table with Date.
Also notice that in the CSO dimension, I applied the “1,2,4,8 keys” technique*, which I used here. The purpose of using “1,2,4,8 key” technique is so that the total is unique, for example:
- 1+2 is 3 and 3 is not there in the set (we only have 1,2,4,8)
- 1+4 is 5 and 5 is not there in the set
In the CSO dim it is optional to have rows for CSO key = 3,5,6,7. Without these rows, we can still decode the combination, like this:
To decode 3 (the answer is 1+2), we find out first what is the largest number in the (1,2,4,8) set which is <= 3. The answer is 2. The we find the largest number in the set which is <= 3-2. The answer is 1. So 3 is 2+1.
To decode 5 (the answer is 1+4), we find out first what is the largest number in the (1,2,4,8) set which is <= 5. The answer is 4. Then we find out what is the largest number in the set which is <= 5-4. The answer is 1. So 5 is 4+1.
But, it is a lot easier, if we have the rows for CSO key = 3,5,6,7 because then we can run queries like Q4 where we have this clause: “and O.cso_product = ‘Mortgage, Investment, Credit Card’” rather than trying to decode the product combination like above (mortgage + investment + credit card 1 + 2+ 4 = 7).
*binary keys is probably a better name than “1,2,4,8 keys”.
Vincent Rainardi, 30th July 2011