A bridge table is a fact table connecting 2 dimensions. A bridge table is usually used to solve “dimensions with multi valued attributes” as I showed in my previous article, “Dimensions with Multi Valued Attributes” here. Below is an example of a bridge table:
create table dim_product (product_key int, name varchar(50)) create table dim_colour (colour_key int, colour varchar(50)) create table bridge_product_colour (product_key int, colour_key int) insert into dim_product values (1,'Product1'), (2,'Product2') insert into dim_colour values (1,'Red'), (2,'Yellow'), (3,'Green') insert into bridge_product_colour values (1,1), (1,2), (2,2), (2,3) select * from dim_product select * from dim_colour select * from bridge_product_colour -- Q1. How many products contain colour Yellow? select count(*) from bridge_product_colour F join dim_colour C on F.colour_key = C.colour_key where C.colour = 'Red' Output: 1 -- Q2. How many product for each colour? -- (total is more than 3 because 1 product can have multiple colours) select C.colour, count(*) from bridge_product_colour B join dim_colour C on C.colour_key = B.colour_key group by C.colour Output: Red 1 Green 1 Yellow 2
Above is an example of a simple bridge table. It only connects the product and colour.
But sometimes, we have “instances”. Each item can have multiple instances. Each instance has different colour scheme, for example instance 1 of product 1 is red and yellow, whereas instance 2 is yellow and green.
In this case we need to create an “instance degenerate dimension” column in the bridge table, and we need to set the colour key to 1, 2, 4, 8, etc, so that when we sum the colour key, we will get a unique number.
An example of this is film. A film has multiple genres. Film1 is classified as “Action, Drama”. But Film1 can also be classified as “Action, Drama, Comedy”.
The example below illustrates this case:
create table dim_film (film_key int, title varchar(50)) create table dim_genre (genre_key int, genre varchar(50)) create table bridge_film_genre (film_key int, instance_dd int, genre_key int) insert into dim_film values (1,'Film1'), (2,'Film2') insert into dim_genre values (1,'Action'), (2,'Drama'), (3,'Action, Drama'), (4,'Comedy'), (5,'Action, Comedy'), (6,'Drama, Comedy'), (7, 'Action, Drama, Comedy') insert into bridge_film_genre values (1,1,1), (1,1,2), (1,2,1), (1,2,2), (1,2,4), (2,1,1),(2,2,1),(2,4,2) select * from dim_film select * from dim_genre select * from bridge_film_genre film_key instance_dd genre_key 1 1 1 1 1 2 1 2 1 1 2 2 1 2 4 2 1 1 2 2 1 2 4 2 -- Q1. How many films contain Drama genre? select count( distinct film_key) from bridge_film_genre where genre_key = 2 Output: 2 -- Q2. List of films containing Drama genre: select distinct F.title from bridge_film_genre B join dim_film F on F.film_key = B.film_key join dim_genre G on G.genre_key = B.genre_key where G.genre = 'Drama' Output: Film1 Film2 -- Q3. What are the genres for each instance of Film1? select B.instance_dd, G.genre from ( select film_key, instance_dd, sum(genre_key) as genre_code from bridge_film_genre group by film_key, instance_dd ) B join dim_genre G on G.genre_key = B.genre_code join dim_film F on F.film_key = B.film_key where F.title = 'Film1' Output 1 Action, Drama 2 Action, Drama, Comedy -- Q4. For Film1 how many are Action, Crime, and Comedy? select count(*) from ( select film_key, instance_dd, sum(genre_key) as genre_code from bridge_film_genre group by film_key, instance_dd ) A join dim_genre B on B.genre_key = A.genre_code join dim_film C on C.film_key = A.film_key where C.title = 'Film1' and genre = 'Action, Drama, Comedy' Output: 1
Notice how value genre_key is set to 1,2,4, etc so that its sum is unique. But we created rows in the dim_genre for genre_key = 3,5,6, and 7 to so that we can join back to get the description.
I would like to thank Rohinton Tarapore for highlighting the problem to me.
Vincent Rainardi, 27th July 2011
[…] both are dim key columns, connecting the 2 dimensions (see my previous article about Bridge Table here*). But there are some cases where we have the 3rd column, which points to the 3rd dim. And this 3rd […]
Pingback by Bridge Table with Date « Data Warehousing and Business Intelligence — 30 July 2011 @ 3:06 am |
1)What is the purpose of instance_dd column and how to populate it?
For Example:-(1,1,1)
First 1 denotes it corresponds to film 1 based on film_key
Second 1 is not clear to me?
Third 1 denotes it is action film based on genre_key
2)Here manually you are populating bridge_film_genre table using insert statements.
But in real data ware house project how you will populate the Bridge table?Is it based on dimension tables or what is the logic for it?
Comment by Vamsi Krishna — 27 September 2016 @ 9:38 am |
Hi Vamsi,
1) The purpose of instance_dd column is to differentiate two classifications that exist for film1, i.e. this film is known/classified as a “action, drama” film and also known/classified as a “action, drama, comedy” film. The second “1” means that this is the first known classification. If the instance_dd value is 2, it means that this row is the second known classification for this film.
2) It is based on the source system which stores which film is known/classified as what genre(s). Note that this data is time variant, i.e. it changes over time. For example, film1 on 5th Jan 2016 was known as “action, drama” and “action, drama, comedy” as above, but on 25th June 2016 it is known as “action, drama, comedy” only. In this instance on 25th June 2016 we expire the rows with instance_dd = 1.
Hope this helps.
Comment by Vincent Rainardi — 28 September 2016 @ 9:00 pm |